Unit Propagation - Speeding up HTML generation by 2000%

When I started this blog in 2024, generating the HTML for this site took between 3 and 5 seconds. This was good enough at the time.

Time passed, and it’s now a year later. I’m archiving one of my old blogs on this site when I notice that HTML generation takes over 20 seconds.

This happened because my old blog consists of 20-some pages and (small) posts, which more or less doubled the volume of my site.

You’d expect a simple homebrew static site generator (SSG) to be quick, but clearly mine was not. After some measuring, it turns out pandoc was the bottleneck. Long story short: I am now caching all calls to pandoc. Excluding cache warmup, HTML generation times are between 1 and 1.8 seconds depending on which laptop I’m on and if the laptop is charging or not. And achieving that performance required only very small changes!

How can pandoc even be your bottleneck?

Some of the pages of my site require multiple calls to pandoc. The core assumption of my SSG is that calling pandoc is basically free, so I use it pretty much everywhere. For example, the blog posts on this site require:

From my experiments, the runtime of pandoc is usually between 50ms and 300ms. As a general purpose swiss-army knife for Markdown documents, this is fine. Especially if you can get pandoc to batch-convert all your files. For my use case, which is invoking pandoc separately for each piece of Markdown-related work I have as if it’s a Python built-in, it’s not ideal.

From the first moment that I switched to pandoc, the goal was to reduce code size of my SSG, at the cost of run-time performance. This still holds up, but I just hadn’t foreseen that the run-time overhead would be this much.

Long story long: prototyping a build system

The longer story of resolving this bottleneck is that I spent the christmas break working on my own build system, taskgraph. The idea was to formulate dependencies and outputs precisely, so the build script could figure out how to only rebuild parts of the site that needed updating using graph analysis.

The general architecture was simple. Each task would define a list of files as inputs, and a list of files as outputs. Combining all tasks, this implicitly forms a graph, where tasks and files are nodes, and dependencies and outputs are directed edges. taskgraph would then compute the partial order of tasks, and check each output for changed depencies. Whenever it detected an outdated output, the corresponding task is executed to update the outputs. A mapping of path to hash of inputs was kept track of between executions.

I used content-based change detection, basically combining and checking file hashes to see if anything needs updating. Computing hashes is not free, but it was generally fast enough for my use case. It also has the benefit that merely deleting a dependency would also cause a rebuild. This can be can be a problem in build systems, e.g. GNU Make, where adding or removing a dependency does not always trigger a rebuild.

It was a fun side-project. I got a basic prototype going that has all the essential functionality of a build system and generates roughly 20% of my blog. It also ran pretty much instantly when I would change only one file. It seemed the primary goal of this SSG rewrite was within reach!

Alas, I did not fully move my SSG to taskgraph. There were three problematic downsides.

Shortcomings of taskgraph

Problem one, I still needed to port over the remaining 80% of my SSG. While certainly possible, it felt like unnecessary work. Especially in the presence of problem two: fine-grained specification of inputs and outputs is annoying and verbose. At least, the way I designed taskgraph is. Here’s the class for the task to import Markdown into the Python datastructure:

class MdToPandoc:
    paths: MdPaths

    def inputs(self):
        return [self.paths.file_path]

    def outputs(self):
        return [self.paths.ast_path]

    def run(self, ctx):
        doc = pandoc.read(file=str(self.paths.file_path))
        write_pickle(self.paths.ast_path, doc)
        return [self.paths.ast_path]

14 lines, assuming Black formatting. And the only thing that is, essentially, accomplished is that the doc = ... line is cached. While my SSG does not contain that many moving parts, I was expecting, at the very least, for the code size of my SSG to grow an order of magnitude. Sure, you can come up with a shorter inline syntax to define tasks like the one above. And maybe you can make pickling/unpickling of files happen implicitly in taskgraph somehow. But the prospect of having to spell out all required files one-by-one, and having to pay an order of magnitude of code to maintain, annoyed me.

And that’s only the start of the problem. My SSG heavily relies on arbitrary Python execution in Mako templates. While possible, it’s annoying to fit this model into the mold of taskgraph tasks. I like being able to extend the blog by putting more logic in the templates, keeping the SSG base script small. Adding friction there would be a high price to pay. In addition, the Mako templates are definitely not the bottleneck in the SSG performance. Mako templates would therefore gain very little by being cached, so paying the cost of porting them to taskgraph made no sense.

The third problem is that I basically implemented a generic task library that already exists: pydoit. While I haven’t looked at it in-depth, it seems similar to taskgraph, but better. This left me with two choices: use my own large and clunky thing, or introduce another dependency to my SSG?

The current state of things: caching pandoc

I took a step back, and realized that I actually only need to speed up the 10 lines of code in my SSG that look like this:

doc = pandoc.read(file=str(self.paths.file_path))

This actually wasn’t difficult. This small class is now doing the heavy lifting of all my pandoc-related needs, and caching the results:

class PandocStore:
    def __init__(self):
        self.read_cache = {}
        self.write_cache = {}

    def read(self, doc, options=[]):
        args = (pickle.dumps(doc), tuple(options))
        if args not in self.read_cache:
            result = pandoc.read(doc, options=options)
            self.read_cache[args] = result
        return self.read_cache[args]

    def write(self, doc, options=[]):
        args = (pickle.dumps(doc), tuple(options))
        if args not in self.write_cache:
            result = pandoc.write(doc, options=options)
            self.write_cache[args] = result
        return self.write_cache[args]

It’s basically wrapping the pandoc Python package in a simple caching layer.

The nice part is that this approach works properly even when the build script changes. This wasn’t the case with the taskgraph approach: changing the build script required manually cleaning the cache. This sounds easy to detect automatically, but it actually is not. What if a system upgrade silently upgrades one of the libraries Python implicitly uses? Now, only if pandoc’s behaviour changes the cache needs to be deleted manually, which is pretty rare.

There are a few small downsides. They are acceptable, and I expect they will remain so in the foreseeable future.¹

There’s lots of pickling/unpickling going on. This is required because the Python Markdown datastructure is mutable, so I can’t rely on hashing. Luckily, Python pickling is fast, so it’s not a bottleneck.

Similarly, to hash the list of options, I shallowly freeze it by turning it into a tuple. This works, as for my SSG, the options are only ever strings. If more complicated arguments are ever used, I’ll need to pickle these, too.

The new structure required all my calls to pandoc to be path-independent. Basically, any calls such as pandoc.read(file=path) had to be changed to pandoc.read(path.read_text()). This was already mostly the case, so refactoring the few calls where this was not the case was easy.

Finally, I also need to empty the cache manually every once in a while. As is, it will keep growing indefinitely. I could adapt the system to remove unused cache entries each run. However, the cache is only 17MB, so it’s not worth the effort yet.

Going forward

Generating my site is now pretty snappy. Maybe it’s only a matter of time until I add some extension of my site that makes generation slow again. Possibly, if that happens a few times, I’ll have to reconsider the build system approach. For now, I’m liking this surgical change because it’s so small. I hope, and expect, that I can apply this approach to future bottlenecks, too.

Speeding up HTML generation by 2000%

How can pandoc even be your bottleneck?

Long story long: prototyping a build system

Shortcomings of taskgraph

The current state of things: caching pandoc

Going forward

Shortcomings of `taskgraph`