duncanlock.net

Using AsciiDoc & Asciidoctor for blogging

Asciidoc
Figure 1. Asciidoc.

I’ve been using reStructuredText for writing on this blog, because it has lots of built-in features that markdown doesn’t.

However, reStructuredText’s actual syntax is a bit…​ fiddly - particularly its non-atx headings, too many things relying on lining up white space, etc…​ If I don’t use it for a bit, I have to look up or copy & paste all the advanced syntax.

I’d prefer to use AsciiDoc, as it has all the extra features, and if you use Asciidoctor, all the simple stuff is the same as markdown - which isn’t (currently) standard AsciiDoc, but is a nice simplification.

The subset of features from reStructuredText (or Asciidoc) that Markdown doesn’t have – and that I’m actually using on this blog, are:

  • Figure/Images with captions
  • Admonitions
  • Front-matter/Metadata
  • Footnotes

Worse is better?

Markdown has lots of problems. Most of these problems stem from two things: Until CommonMark it was lazily, pointlessly, stubbornly non-standard - and it has very few structural or semantic features. Because of people wanting to add some of these features - and because there was no standard and therefore no way to extend one, Markdown has exploded into many fragmented semi-compatible dialects.

However, worse is better. Markdown’s simplicity makes it pretty simple to implement, to the extent that there are native implementations of Markdown in probably every language: AWK, Bash, C …​ to Zig. There are hundreds of more-or-less complete markdown implementations to choose from, to fit any project; here are fifty high quality maintained ones, that support CommonMark, sorted by language, for example.

AsciiDoc is much better than Markdown, but these extra features for the writer come with extra complexity for the person implementing the tools. Probably as a consequence, the AsciiDoc ecosystem and tooling is very anemic and leaves a lot to be desired.

The only complete and well maintained AsciiDoc processor is Asciidoctor - which is written in Ruby. That’s it - there are no other options as of early 2021[1]. I don’t have anything against Ruby, particularly, but I don’t know it or use it for anything, so it’s not familiar. It also has all the same well know problems as Python with packaging/running projects & managing dependencies. Instead of pip/setuptools/virtualenv, etc…​ its gem/rbvenv/rake/bundler/rvm, etc…​ hundreds of global gem files, the whole giant mess. I get the impression that the Ruby version of this mess is…​ less of a problem somehow than the python one, but it’s still a mess.

So, the AsciiDoc story is obviously much more limited than the markdown story. If you want native markdown support, you got it, no matter what you’re doing. If you want native AsciiDoc support, you can only have it if your project is in Ruby (or Java & JavaScript with some caveats [1]). Other than that, you have to shell out and run a Ruby process - and have all the required Ruby dependencies installed.

This turns your document processing into a slower-than-it-could-be, annoying to setup and maintain, external black box. Don’t take this the wrong way - the contents of that black box are fantastic and good people have worked hard on them, but there are only a few of those people and pitting them against the markdown community - which is much, much larger - isn’t really fair.

What are the consequences of this for blogging?

Editing experience isn’t as good

Most modern editors support AsciiDoc syntax highlighting and sometimes preview, via plugins. But, as expected, these aren’t as well-developed, numerous or as fully featured as the Markdown equivalents. If you’re using the Asciidoctor markdown-a-like syntax for the simple stuff, then you can just tell the editor it’s Markdown, and use the more fully featured markdown support.

Not much native support

If you’re using a Ruby blog engine, it might have native AsciiDoc support, or your blog engine might have a plugin that supports it. Don’t expect every blog engine to support it, though. Some blog engines have an escape hatch that let you use Pandoc or some other custom command to process your content - and while Pandoc does have AsciiDoc support, it’s neither complete nor flawless.

Does native support matter?

Does the fact that the document processing is shelled out to a black box actually affect using AsciiDoc for blogging? Well, it makes publishing a bit slower to a lot slower, depending on the size of your site. But in most blog/publishing systems you have a template that defines the structure of the page with some kind of placeholder that says {{content-goes-here}}. It doesn’t really matter how the content is generated, as long as it is - and as long as the HTML that’s produced lines up with your CSS & is reasonable. Speaking of which…​

Asciidoctor’s built-in HTML output could be better

When asciidoctor processes your .adoc file you can tell it what converter (or “back-end”) you want to use - i.e. what kind of output you want. These are the built-in options. This is what they say about the default HTML converter:

The HTML 5 converter (html or html5) generates HTML 5 styled with CSS3. This is the converter Asciidoctor uses by default.

This is technically true, however, while the HTML it produces is technically HTML5, it’s also <div> soup. For example, this document processor for processing documents doesn’t output paragraph tags (<p>…​</p>) - it outputs this instead: <div class="paragraph"><p>…​</p></div> - and it does something like that for basically everything. To be fair, they are aware of this & working on it.

So, I’m not going to use that, I’ll just use some community written back-end/converter that’s better…​ oh…​ yeah. Well, luckily, in this case there actually is one: html5s - which does a much better job.

AsciiDoc Rough Edges

I’ve written a few articles from scratch and converted the existing 80 reStructuredText articles to AsciiDoc, and it’s been fairly painless, but I have come across a few rough edges and problems with AsciiDoc.

Footnotes

AsciiDoc has built-in support for footnotes, but there are some rough edges:

The combination of these issues means that if you want externalized footnotes that work like the rest of your content, you have to give the footnote an ID and wrap the footnote definition in an inline pass-through. Because these become document attributes, you have to define them before you use them, so you should probably put these at the top.

This is more complex & convoluted than it needs to be - footnotes should just work. Anyway, it looks like this:

:fn-disclaimer: pass:q[footnote:disclaimer[Opinions are *my own*.]]

A bold statement!{fn-disclaimer}

Another bold statement!{fn-disclaimer}

They have a proposal for an improved footnote syntax - although it doesn’t talk about text formatting inside the footnote.

Blockquotes

I took me ages poking around on GitHub before I found out how to set the link text in the citation for a quoted block. This is the basic syntax:

[quote, attribution, citation title and information]
Quote or excerpt text

You can put a URL in there, and it works, but giving the URL a title doesn’t seem to work. So this works:

[quote, https://en.wikipedia.org/wiki/Main_Page]
Quote or excerpt text

but this doesn’t:

[quote, https://en.wikipedia.org/wiki/Main_Page[Wikipedia]]
Quote or excerpt text

However, using the Quoted paragraph syntax works:

"Quote or excerpt text"
-- https://en.wikipedia.org/wiki/Main_Page[Wikipedia]

Apparently, the correct way to do this with quoted blocks, is to “use single quotes around the attribute value, that gives Asciidoctor the hint to apply normal substitutions (just like paragraph text)”[2]. Not sure what that means at this point, but the docs on subtitutions are here. This is what it looks like in this case:

[quote, 'https://en.wikipedia.org/wiki/Main_Page[Wikipedia]']
Quote or excerpt text

Using AsciiDoc with Pelican

I’m currently using Pelican for this blog and writing this post in AsciiDoc. This is what you need to do to get that working.

First install the Ruby dependencies & Asciidoctor itself. Unlike me, you should listen to them and use RVM for this. Once you have that installed, you need to install html5s and its dependencies. Next, you need to add the asciidoc_reader Pelican Plugin and add it to your pelicanconf.py

PLUGINS = [
    'asciidoc_reader',
]

You should then set the Asciidoctor command line options. These will configure it to use the html5s backend and rouge for source code syntax highlighting:

ASCIIDOC_OPTIONS = [
    '-a source-highlighter=rouge',
    '-a rouge-style=monokai',
    '-r asciidoctor-html5s',
    '-b html5s'
]

Rouge is compatible with pygments - which I was using previously and my theme is set up to expect, so this was a drop-in replacement - which is very convenient.

Adding & removing plugins

The AsciiDoctor + htmls output has better figure output than reStructuredText + my Better Figures & Images Plugin, so I don’t need that anymore - provided that I convert all articles using figures to AsciiDoc. On the other hand, the extract_toc plugin doesn’t work for AsciiDoc + htmls output, so I copied it to a local plugin and modified it to work:

# -*- coding: utf-8 -*-
"""
Extract Table of Contents from AsciiDoc output from the htmls backend
========================

A Pelican plugin to extract table of contents (ToC) from `article.content` and
place it in its own `article.toc` variable for use in templates.
"""

from os import path
from bs4 import BeautifulSoup
from pelican import signals, readers, contents
import logging

logger = logging.getLogger(__name__)


def extract_asciidoc_toc(content):
    if isinstance(content, contents.Static):
        return

    soup = BeautifulSoup(content._content, "html.parser")
    toc = None

    toc = soup.find("nav", id="toc")

    if toc:
        toc.extract()
        content._content = soup.decode()

        # Remove: <h2 id="toc-title">Table of Contents</h2>
        toc.h2.decompose()

        # Change all ordered lists to unordered
        for l in toc("ol"):
            l.name = "ul"

        content.toc = toc.decode()

        logger.debug(f"ExtractAsciidocToc: content.toc: {content.toc}")


def register():
    signals.content_object_init.connect(extract_asciidoc_toc)

Converting your existing content to AsciiDoc

This depends on the format of your existing content:

Converting reStructuredText to AsciiDoc

I tried lots of different way of converting reStructuredText to AsciiDoc - and none of them are perfect.

Pandoc does a reasonable job, unless you use figures, which get pretty mangled. They are aware/working on this. If you want to use pandoc, This is the basic command:

$ pandoc --wrap=preserve -f rst -t asciidoctor "source.rst" > "dest.adoc"

As well as figures, this also messes up metadata and pelican {static} links, so you probably want some pre- & post-processing to fix that. I wrote a little shell script to fix everything except the figures:

#
# Pre-process
#
cat "$src_path" | \
# Remove :alt: tags from figures & images, otherwise they get lost
sed -r 's/:alt: /\n/g' | \
# Tabs to spaces
sed -r 's/\t/  /g' | \
#
# Convert rst to asciidoc using pandoc
#
pandoc --wrap=preserve --from rst --to asciidoctor | \
#
# Post-process
#
# Fix metadata syntax, from date:: to :date:
sed -r 'N; s/^(.*)::\n /:\1:/g; P; D' | \
# Remove extra breaks created from figure caption conversion
sed -r 'N; s/____\n//g; P; D' | \
# Fix Pelican {static} links
sed -e 's/%7B/{/g' -e 's/%7D/}/g' \
> "$src_folder/$src_name".adoc

Fine, I’ll write my own converter…​

I use figures quite a bit, so this wasn’t a very satisfactory solution. Thinking about it, reStructuredText is basically the Python documentation format, so I looked for docutils based tools to convert reStructuredText to other things. I eventually found sphinx_asciidoc, which sort of worked - and was a fairly straightforward python script that I could improve. I forked it here and fixed all the issues I found - fixing metadata, figures, tables, linked images and various other things.

I developed & tested this by converting all 80 odd rst articles on this site to AsciiDoc and fixing all the issues that I fond in the converter.

Until pandoc fixes their figures, as far as I know, this is probably the best, highest fidelity way to convert reStructuredText to AsciiDoc. If you want to use this, do something like this to get it setup:

$ git clone https://github.com/dflock/sphinx_asciidoc.git
$ cd sphinx_asciidoc
$ python3 -m venv ~/venv/sphinx_asciidoc
$ source ~/venv/sphinx_asciidoc/bin/activate
$ python3 -m pip install -r requirements.txt

then this to run it:

$ python3 ./sphinx_asciidoc/writer.py source.rst

This will create a source.rst.adoc file in the same folder. I tried to keep this as general purpose as possible, but there are probably some things in here which are specific to my documents. There is a section at the top of writer.py with some knobs to twiddle:

#
# Things that should be options, but aren't
#
# Output the rendered TOC from docutils, or just `:toc:`
self.outputTOC = False
# Table column alignment, if not specified. Can be <>^ or
# '' for unspecified.
self.defaultTableColAlign = ""
# Specify percentages for columns widths, or leave browser to auto-layout?
self.defaultTableColWidths = True
# Do you want to output the [1] ref's after the {footnote}, or let asciidoctor do it?
self.outputFootnoteRef = False

Converting Markdown to AsciiDoc

If your content is in Markdown, you need Kramdown. Kramdown is a very good markdown to AsciiDoc converter, that works great and produces flawless AsciiDoc - unsurprising, given that it’s written by Dan Allen, the same guy who largely runs the Asciidoctor project. Once you have Kramdown installed, you can just do: $ kramdoc source.md and it’ll create a source.adoc file in the same folder.

Future of AsciiDoc & Asciidoctor

There are a few promising projects that will help improve the AsciiDoc ecosystem.

The AsciiDoc Specification

The first and biggest one is that AsciiDoc is finally getting a proper spec, under the umbrella of the Eclipse Foundation. This is something that Markdown never had until CommonMark - and that AsciiDoc has lacked up to now. What this means is:

The specification for the AsciiDoc language will include an open source specification document, which defines required and optional API definitions, semantic behaviours, data formats, and protocols, as well as an open source Technology Compatibility Kit (TCK) that developers can use to develop and test compatible implementations. …​ A compatible implementation, as defined by the EFSP, must fully implement all non-optional elements of a specification version, must fulfill all the requirements of the corresponding TCK, and must not alter the specified API.

For users and developers alike, the AsciiDoc specification will mean a clear, working definition of what AsciiDoc is and how it should be interpreted. Developers will be able to build implementations, tools, and services around AsciiDoc without risk of diluting its meaning or splintering it. In turn, users will have more options, greater document portability, and the assurance that compatible implementations and tools will handle their AsciiDoc documents according to a versioned specification.

Here is the AsciiDoc Language project proposal and the approved scope of the project.

So, this should help prevent the fragmentation that plagues the Markdown ecosystem, as well a making it easier for people to develop AsciiDoc parsers & tools. Still nowhere near as easy as implementing a Markdown one, though - AsciiDoc is just more complex.

Having said that, this is a big project and most of the activity is taking place on mailing lists - there also now a website for the Working Group which currently includes meeting minutes etc…​ There is now an AsciiDoc language repo for discussing the spec work, but it’s still early days: https://gitlab.eclipse.org/eclipse/asciidoc/asciidoc-lang/

libasciidoc

Libsciidoc is a Golang library for processing AsciiDoc files. This uses a PEG parser with a formal grammar for AsciiDoc. It already supports a useful subset of AsciiDoc and is being slowly worked on by a few people, I think with the intention to use it with Hugo, which will make a nice combination, when it’s done.

Like most software written in Go, it’s statically linked, which means no dependencies at all - you just need to put the libacsiidoc binary somewhere and run it. This is really nice compared to setting up and maintaining the Ruby dependencies required for Asciidoctor, or the JS & Java ones for Asciidoctor-J/Java, for example.


Footnotes & References:


  1. Asciidoctor can also be run on the JVM - Asciidoctor-j (Java) or in a Browser/Nodejs - Asciidoctor-js (JavaScript). These are both just the Ruby version running in different places - either using JRuby to run on the JVM, or using the Opal Ruby to JavaScript source-to-source compiler to run the Ruby code on a JavaScript VM. The Opal runtime + the AsciiDoc source weighs in at about 1.2Mb of JS. These are both a bit fat and slow and don’t really solve any of the AsciiDoc ecosystem’s problems.
  2. Using single quotes doesn’t fix the formatting on footnotes, so I guess “normal substitutions” are different somehow?

Comments