Making a simple yet fast static site builder
A week ago, I've come accross a very interesting
article by Roman
Zolotarev about a little script called ssg
.
I very much liked the way Roman was thinking on the problem:
- Use unix tools
- Keep the script simple (240 LoC) and maintanable
- Keep it small
As a system developer, it is always a good thing to see people making the effort to keep their tools, scripts, binaries low on dependencies, disk space and memory. Folks these days tend to forget that you don't need 2GB of RAM in order to display a picture of a cat (Looking at you slack!)
To be honest, I've never been interested in static site builder before Roman's post. I did see the utility in it, but I've never needed it myself (Though this blog is the 'living' proof that is not true anymore). So, I've never used Jekyll, Hugo, or any common site builder.
I was going to clone and use ssg
as my blog builder whithout even thinking
about it, but then I read this about ssg
performance:
100 pps. On modern computers
ssg
generates a hundred pages per second. Half of a time for markdown rendering and another half for wrapping articles into the template. I heard good static site generators work—twice as fast—at 200 pps, so there's lots of performance that can be gained. ;)
I was astonished by this performance. A hundred page per second is very slow
in mordern computing. Yet, ssg
is written in shell, so there's a lot of
fork(2)
and dup2(2)
involved, and I can assume it was not written for
speed, but for simplicity. Still, Roman tells us that a 'good' static site
generators works around 200pps
, which is still very slow. As a challenge and
for my personnal curiosity, I've tried to develop a simple C static site
generator, see what kind of performance could be done.
shayla
shayla
is the result of this work. It's a small C binary (5k LoC), that
reads, parses and generates HTML from markdown. The result binary is about
83Kb
stripped, and come with no dependency (Besides the obvious libc
). It
has been developed and tested under a GNU/Linux, but should be working with
equal performances under OSX or FreeBSD. I will talk about how to use shayla
at the end of this post, after all the parts about performance.
Test process
The test plaform will be my Thinkpad T480s, with a shiny new Arch Linux
(4.17.4) and a Samsung NVME SSD of 500G
. The write speed of my disk is about
600MB/s
, and the read speed is about 1.5G/s
$> dd if=/dev/zero of=output bs=1024 count=1000k
1024000+0 records in
1024000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 1.77229 s, 592 MB/s
$> dd if=output of=/dev/null bs=1024 count=1000k
1024000+0 records in
1024000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 0.71487 s, 1.5 GB/s
The actual test will be to build a lot of pages into a website. There is
305Mb
of pages (around 25k files), and each page is 'complex' markdown in
order to test the parser totally. Each page contain the following header:
For Shayla:
---
title: article_X
summary: This is article X
route: article-X
---
For Hugo:
---
title: "Article"
date: 2018-08-23T20:20:56+02:00
---
For Jekyll:
---
title: "Article number 0"
layout: post
---
And the actual content is this markdown
file.
All the times have been mesured with /usr/bin/time
.
Note on the benchmark
The purpose of this benchmark is not to proove that X is better than Y.
Jekyll and Hugo have far more features than ssg
or shayla
, and I think are
very good for building / testing a static website. For example, the live
server
feature is pretty useful for development, and the templating system of
those tools are fare more superior than shayla
. The purpose of this benchmark
is to test the actual speed of those tools to transform markdown in html,
applying a template and generate an index . Which is precisely all I need for
this blog.
TL;DR I do not claim that these benchmarks are representative of the
software tested. It is simply a pps
benchmark, so take the results with a
grain of salt.
Benchmark
Let's begin by the obvious: shayla
is fast. It is 663 times faster than
Jekyll
and 13 times faster than Hugo
. But again, shayla
is simple, and
surely too simple for projects other than blogs. By doing those tests, I did
have the pleasure to test Hugo for the first time, and I must say I'm impressed
with the software. The code looks clean, the template system very simple, and
the performances are really nice. For a more complete static site project, I'll
go with Hugo
without an hesitation.
But for a simple blog, I'll stick with ssg
or Shayla
. There is no real
difference between those two tools, besides what you want to learn from using
it. ssg
is a simple shell script, easy to hack for your needs, and learn a
thing or two about your system. If you do not feel concerned with the point
above, maybe Shayla
is the right tool for you.
Using Shayla
Usage: shayla -[vhtsldrfuit]
Generate an HTML static site for markdown sources.
If used with no options, shayla will look for directory in the current path.
Options:
-v, --version Print software version
-h, --help Show this message
-t, --title=TITLE Title to be used in the final site
-s, --src=DIR Markdown sources directory
-c, --style=DIR Style sources directory
-l, --layouts=DIR Layouts directory
-d, --dest=DIR Destination directory
-r, --root=ROOT Root URL of the website
-f, --favicon=FILE Favicon to use
-u, --url=URL Url of the website
-i, --img=DIR Images directory
-t, --threads=NUM Number of threads to launch
--debug Print more information
Tree
Here's the 'required' tree for Shayla
:
├── img
├── layouts
│ ├── footer.html
│ ├── header.html
│ └── intro.html
├── markdown
└── styles
'required' is quoted because directories can have any names, could be at any place on your filesystem. This is just the default setup.
img
is the directory where you are to store all your images. You can reference them by using![My super image](img/my_super_image.png)
in your pages.layouts
is the directory where you are to store the template files of your sites.header.html
andfooter.html
are pretty much self explanatory, andintro.html
is what it is displayed on theindex.html
page, juste above the articles.markdown
is where you are to store your pages, in markdown.styles
is where you are to store your.css
files.
There is no shayla init
. I think you can manage creating 4 directories by
yourself.
Post
A little header is required at the beginning of every post:
---
title: My first Article
summary: This is my first article
---
These 2 are required for every post. Here's a complete list of all the options:
route
: Future route name of the article. It will be displayed asroute.html
is the final site. It is up to you to handle duplicates.summary
: A one-line quick summary of the article, It is used for link title and RSS generation.title
: Title of the articledate
: Date of the article inYYYY-MM-DD
format. If this option is not here,shayla
will look for thelast modified
timestamp from the filesystem in order to establish a chronology.list
: Boolean option, whether or not the article should be listed on theindex.html
. Default istrue
.
Building the website
$> shayla --title "My site title" \
--dest /var/www/htdocs/blog \
--favicon ~/Pictures/blog_favicon.ico \
--url "https://blog.ne02ptzero.me"
You can find the sources and build instructions of shayla
here