Making a simple yet fast static site builder
- Use unix tools
- Keep the script simple (240 LoC) and maintanable
- Keep it small
As a system developer, it is always a good thing to see people making the effort to keep their tools, scripts, binaries low on dependencies, disk space and memory. Folks these days tend to forget that you don't need 2GB of RAM in order to display a picture of a cat (Looking at you slack!)
To be honest, I've never been interested in static site builder before Roman's post. I did see the utility in it, but I've never needed it myself (Though this blog is the 'living' proof that is not true anymore). So, I've never used Jekyll, Hugo, or any common site builder.
I was going to clone and use
ssg as my blog builder whithout even thinking
about it, but then I read this about
100 pps. On modern computers
ssggenerates a hundred pages per second. Half of a time for markdown rendering and another half for wrapping articles into the template. I heard good static site generators work—twice as fast—at 200 pps, so there's lots of performance that can be gained. ;)
I was astonished by this performance. A hundred page per second is very slow
in mordern computing. Yet,
ssg is written in shell, so there's a lot of
dup2(2) involved, and I can assume it was not written for
speed, but for simplicity. Still, Roman tells us that a 'good' static site
generators works around
200pps, which is still very slow. As a challenge and
for my personnal curiosity, I've tried to develop a simple C static site
generator, see what kind of performance could be done.
shayla is the result of this work. It's a small C binary (5k LoC), that
reads, parses and generates HTML from markdown. The result binary is about
83Kb stripped, and come with no dependency (Besides the obvious
has been developed and tested under a GNU/Linux, but should be working with
equal performances under OSX or FreeBSD. I will talk about how to use
at the end of this post, after all the parts about performance.
The test plaform will be my Thinkpad T480s, with a shiny new Arch Linux
(4.17.4) and a Samsung NVME SSD of
500G. The write speed of my disk is about
600MB/s, and the read speed is about
$> dd if=/dev/zero of=output bs=1024 count=1000k 1024000+0 records in 1024000+0 records out 1048576000 bytes (1.0 GB, 1000 MiB) copied, 1.77229 s, 592 MB/s $> dd if=output of=/dev/null bs=1024 count=1000k 1024000+0 records in 1024000+0 records out 1048576000 bytes (1.0 GB, 1000 MiB) copied, 0.71487 s, 1.5 GB/s
The actual test will be to build a lot of pages into a website. There is
305Mb of pages (around 25k files), and each page is 'complex' markdown in
order to test the parser totally. Each page contain the following header:
--- title: article_X summary: This is article X route: article-X ---
--- title: "Article" date: 2018-08-23T20:20:56+02:00 ---
--- title: "Article number 0" layout: post ---
And the actual content is this markdown
All the times have been mesured with
Note on the benchmark
The purpose of this benchmark is not to proove that X is better than Y.
Jekyll and Hugo have far more features than
shayla, and I think are
very good for building / testing a static website. For example, the
server feature is pretty useful for development, and the templating system of
those tools are fare more superior than
shayla. The purpose of this benchmark
is to test the actual speed of those tools to transform markdown in html,
applying a template and generate an index . Which is precisely all I need for
TL;DR I do not claim that these benchmarks are representative of the
software tested. It is simply a
pps benchmark, so take the results with a
grain of salt.
Let's begin by the obvious:
shayla is fast. It is 663 times faster than
Jekyll and 13 times faster than
Hugo. But again,
shayla is simple, and
surely too simple for projects other than blogs. By doing those tests, I did
have the pleasure to test Hugo for the first time, and I must say I'm impressed
with the software. The code looks clean, the template system very simple, and
the performances are really nice. For a more complete static site project, I'll
Hugo without an hesitation.
But for a simple blog, I'll stick with
Shayla. There is no real
difference between those two tools, besides what you want to learn from using
ssg is a simple shell script, easy to hack for your needs, and learn a
thing or two about your system. If you do not feel concerned with the point
Shayla is the right tool for you.
Usage: shayla -[vhtsldrfuit] Generate an HTML static site for markdown sources. If used with no options, shayla will look for directory in the current path. Options: -v, --version Print software version -h, --help Show this message -t, --title=TITLE Title to be used in the final site -s, --src=DIR Markdown sources directory -c, --style=DIR Style sources directory -l, --layouts=DIR Layouts directory -d, --dest=DIR Destination directory -r, --root=ROOT Root URL of the website -f, --favicon=FILE Favicon to use -u, --url=URL Url of the website -i, --img=DIR Images directory -t, --threads=NUM Number of threads to launch --debug Print more information
Here's the 'required' tree for
├── img ├── layouts │ ├── footer.html │ ├── header.html │ └── intro.html ├── markdown └── styles
'required' is quoted because directories can have any names, could be at any place on your filesystem. This is just the default setup.
imgis the directory where you are to store all your images. You can reference them by using
![My super image](img/my_super_image.png)in your pages.
layoutsis the directory where you are to store the template files of your sites.
footer.htmlare pretty much self explanatory, and
intro.htmlis what it is displayed on the
index.htmlpage, juste above the articles.
markdownis where you are to store your pages, in markdown.
stylesis where you are to store your
There is no
shayla init. I think you can manage creating 4 directories by
A little header is required at the beginning of every post:
--- title: My first Article summary: This is my first article ---
These 2 are required for every post. Here's a complete list of all the options:
route: Future route name of the article. It will be displayed as
route.htmlis the final site. It is up to you to handle duplicates.
summary: A one-line quick summary of the article, It is used for link title and RSS generation.
title: Title of the article
date: Date of the article in
YYYY-MM-DDformat. If this option is not here,
shaylawill look for the
last modifiedtimestamp from the filesystem in order to establish a chronology.
list: Boolean option, whether or not the article should be listed on the
index.html. Default is
Building the website
$> shayla --title "My site title" \ --dest /var/www/htdocs/blog \ --favicon ~/Pictures/blog_favicon.ico \ --url "https://blog.ne02ptzero.me"
You can find the sources and build instructions of