This is will be lengthy, but comprehensive. It pulls together everything I’ve learned to build my websites with a workflow that doesn’t suck and provides the level of control and output style that I want.
I have had many aborted attempts in the past to build and maintain my websites, trying everything from Ruby on Rails to Wordpress to raw
Being a software developer, I want to build my websites the same way that I build software – from the commandline, using version control, with a text editor :)
My goals are:
- Compose from the commandline. I don’t need online/remote editing capabilities.
- Focus on content. Minimize effort spent on formatting, style, and layout.
- Static website. No dependencies on webservers, databases, CGI scripts, VMs, etc.
- Long-term support. I don’t want to convert my website to a different framework in a few years.
My websites are available in GitHub to follow along.
I’ve long admired Edward Tufte’s www.edwardtufte.com
series of books abut data visualization.
I want my websites to have a similar style, and I was not the first person to have that thought. Tufte CSS adapts Tufte’s style to the web using CSS.
I looked at many existing static-site generators (Jekyll, Hyde, Pelican, Hugo, and even Gitbook). While they each had their strengths, they also had various drawbacks:
- Proprietary layouts/formatting hooks.
- Difficult to extend to support Tufte CSS’s featuresThere is now a Tufte Jekyll theme available.
(e.g. margin notes).
- Written in a language I don’t know (in case I have to extend its functionality).
- They were all relatively new – will they still be around in 5yrs?
- Required too many dependencies to install.
Most of these are minor or irrelevant issues for most people, but I’m doing this for myself so I get to make the rules :)
I’ll be assembling a set of tools to do my own static-site generationThat’s what the rest of the page describes.
My ideal workflow is very straightforward:
- Write content in
- Build a local version of the website.
- It should look/act the same as the online version.
- Review the local website.
- Publish the website.
It doesn’t get much simplerI could extend the wrapper to include the webserver and monitoring commands, and automatically publish using a git hook…
. Below is the actual workflow, the rest of this page explains how all the parts come together to support this workflow.
# Go to my local websites `git` repo. cd ~/workplace/websites # Launch a local webserver in the background, then # open http://localhost:8000 in my browser. python3 -m http.server -d jasonpeacock.com & # Monitor the markdown content for changes and # automatically rebuild the website. rg -l . jasonpeacock.markdown | entr ./website-generate jasonpeacock.com # Edit content, save changes, and review the results when I # reload the browser tab. vim jasonpeacock.markdown/content/index.md # Publish to my webhost. ./website-publish jasonpeacock.com # Commit and push the website changes. git add -A && git commmit -m "updated website" && git push
As a software developer, I spend a lot of time typing at the commandline, that’s where I’m the most comfortable. My fingers are well-trained with shell shortcuts and VimIt’s actually Neovim.
commands; my terminals are solarizedSolarized terminal & editor colors.
and multiplexedtmux terminal multiplexer.
Thus it follows that when I’m writing and editing website content, I am using a text editor within a terminal, and my workflow is optimized for that use case.
Version Control and Backup
Everything should be backed up and versioned.
The same lesson applies to websites and their content. I’m using Git for versioning my websites and content, and GitHub for backup.
GitHub is free for personal/public use, and as the website is public it’s OK for the source to also be public. I learned a lot from digging into others’ website repositories, I don’t want to prevent others from learning from my own website repositories.
Managing Large Files
Website have more content than just text, there are images, videos, PDFs,
tar downloads, and other large files.
git does not handle large files efficiently so we cannot include those files directly in the repository.
git-lfs to push the large files to separate storageWhere are git-lfs files stored?
outside of my
git repository, and then a reference to that file’s storage location is included in the repository instead.
git-lfs is a bit of black magic, files are stored by default in the GitHub Large File Storage, wherever that may be – it’s not explained very well. There are some projects that provide alternative locations for large file storage, such as AWS S3:
There are also alternatives to using
git-lfs for managing large files:
For now, I’m using
git-lfs with the default GitHub storage. It’s configured to store all images, videos, and PDFs in GitHub’s large file storage.
For my websites, the
git-lfs configuration is
.gitattributes. An example is below.
*.mp4 filter=lfs diff=lfs merge=lfs -text *.jpg filter=lfs diff=lfs merge=lfs -text *.pdf filter=lfs diff=lfs merge=lfs -text
I use Markdown to write the content. It’s simple, widely supportedThere are some valid concerns about using Markdown, maybe another markup language should be used?
, extensible when needed (insert
HTML as required), and limited in functionality.
Why is limited functionality good? Because it keeps you focused on writing content and not layout. But strict markdown can be too limited – what about
strikeout? Or super and sub script?
Avoiding distraction from formatting is admirable, but to communicate effectively you need to apply formatting appropriately. There are many markdown flavorsAt least 34.
available that extend the original syntax.
I am using the Pandoc Markdown flavor, as I am using Pandoc to build my websites. This supports generating other, non-
HTML output formats as Pandoc will better understand the original content.
MP4 is a widely-supported video format for the web, but the native
MP4 videos from your cellphone are way too large to use – 140Mb for ~15 seconds – they need to be optimized for the web. There a number of technical concernsOptimizing MP4 Video for Fast Streaming.
involved with web-optimizing videos, I use Handbrake to do everything in one pass.
Handbrake, just select the
Vimeo Youtube 720p30 preset and you’re doneThe preset includes web optimization.
. That 140Mb file will now be <5Mb and optimized for web streaming. There are additional options to resize, reduce quality, and remove audio if you need a smaller file.
Somehow, the markdown content needs to be converted into a stylish, functional website. This is what static-site generators do, and what I need to replicate.
Instead of a site generator, I chose to use a document converter instead. The difference is that each page of content is converted into an
HTML document independently; my websites are really a collection of
The drawback is that I need to manually maintain index pages and any links between pages. Using an actual static-site generator would understand the relationship between pages and provide extended syntax to generate links, index pages, lists of “recently updated”, and other blog-like features. I feel the reduction in complexity and framework is worth this overhead as I don’t use those missing features.
One advantage of this approach is that because each page of content is independent it can be easily converted to any document format –
LaTeX, etcI haven’t actually tried converting my content to PDF yet. But theoretically, I could.
The website content is organized into separate directory trees for each website. Each directory includes the
markdown content pages as well as a
resources/ directory with the typical web resources (CSS, JS, etc) and a
pandocomatic/ directory that configures how the website is built.
jasonpeacock.markdown/ content/ index.md projects/ index.md markdown-websites/ index.md another-project/ index.md images/ my_image.jpg pandocomatic/ pandocomatic.yaml tufte.html5 postprocessors/ tidy.sh resources/ .htaccess favicon.ico css/ pandoc.css pandoc-solarized.css tufte.css tufte-extra.css et-book/ ... js/ ...
The content is converted into
HTML documents and copied into a new directory tree, with non-content files copied as-is. The contents of the
resources/ directory are copied into the root of the directory tree, while the
pandocomatic/ directory is excluded.
jasonpeacock.com/ .htaccess favicon.ico index.html projects/ index.html markdown-websites/ index.html another-project/ index.html images/ my_image.jpg css/ pandoc.css pandoc-solarized.css tufte.css tufte-extra.css et-book/ ... js/ ...
I use Pandoc to do the document conversion from
HTML formatPandoc supports a lot of input & output formats.
. I don’t want to use anything written in a programming language that I don’t knowPandoc is written in Haskell.
pandoc is a tool and not a framework – I’ll be using it within a framework.
When converting from
HTML, a template can be provided. For
HTML, this is essential as it provides the boilerplate that a webpage needs but is not included in the content page, or would be redundant to include in every content pageE.g. the copyright notice footer.
For my websites, that template is
tufte.html5, adapted from tufte-pandoc-css.
The template is too large to reproduce here; it uses the Pandoc template language to pull metadata from the content pages and generate
HTML with all the proper tags, including
<header> links for CSS and scripts. Then it injects the converted content and outputs the completed
Pandocomatic is a tool to automate the use of pandoc. With pandocomatic you can express common patterns of using pandoc for generating your documents. Applied to a directory, pandocomatic can act as a static site generator.
I use PandocomaticPandocomatic is written in Ruby, which I am familiar with.
to recursively traverse my website content directory tree and invoke
pandoc to convert all the content pages from
Pandocomatic is quite flexible through the use of
YAML templates, it’s worth reading the docs to understand everything that is possible.
For my websites, that Pandocomatic
YAML configuration is pandocomatic.yaml. An example is below.
settings: recursive: true follow-links: false skip: ['.*', 'pandocomatic.yaml'] templates: blog: glob: ['*.md'] setup:  preprocessors:  pandoc: css: - /css/pandoc-solarized.css - /css/pandoc.css - /css/tufte-extra.css - /css/tufte.css filter: pandoc-sidenote from: markdown+implicit_header_references section-divs: true template: tufte.html5 to: html5+smart toc: true postprocessors: - postprocessors/tidy.sh cleanup: 
In brief, the configuration above recursively scans for all
*.md files and invokes
pandoc with the given
css files, template, filter, etc, and converts to
html5+smart format. After conversion, the
tidy.sh post-processor is run to cleanup the
HTML document formatting.
pandocomatic, you need to provide the content, configuration, and output directories.
To simplify this, I’ve written a wrapper script
website-generate. It takes the name of the website (
jasonpeacock.com) as a parameter, then will look for a matching
jasonpeacock.markdown directory to convert the content from.
It also copies the contents of
jasonpeacock.markdown/resources/ into the root of the new website
jasonpeacock.com/ directory, resulting in a complete, static website at
To automatically rebuild the website after every edit, the
entr tool is used to watch all files in the website directory.
rg is ripgrep, a super-powered
After the website is built it needs to be reviewed. Unfortunately, merely viewing the
index.html file in your browser is not enough to load the CSS and other required resources.
The simplest approach is to use one of the built-in webservers from Python, Ruby, etc. and load the page at: http://localhost:8000/
The website should display correctly with the Tufte CSS style and all images, videos, links, etc working.
Rsync everything to the webhost. It’s good practice to only have one source of authority for anything, that includes website content.
After the website has been built from the content and reviewed it should be pushed directly to the webhost and overwrite anything that’s already there, because you know that your localObviously this is not true when working in a distributed environment with other authors, but my websites don’t have to worry about that because I am the only author.
content is the single source of authority.
rysnc command is configured to be efficient, only pushing files and attributes that have changed, and removing remote files that are no longer relevant. The webhost is already configured for passwordless-ssh.
rsync configuration is below.
To simplify this, I’ve written a wrapper script
website-publish. It takes the name of the website (
jasonpeacock.com) as a parameter and
rysnc’s the website directory to the webhost.
I’ll be filing issues and PRs for these issues that I’ve discovered. There were originally more known issues, but with some investigation I found that they were already known (and working “as designed”), or were due to mistakes by myself :) It’s possible these remaining issues are also due to my own mistakes.
Definition Lists are unsupported
I had to add support to Tufte CSS for
Code in sidenotes is too large
When using inline-literals (“code”, via backticks) in sidenotes the font size is too large.
There’s always more to do and room for improvement. Right now, everything is functional and very usable, I am happy with it.
Containerize the tools
There are a few tools and dependencies to install, as hard as I tried to minimize them. It would be awesome to have everything captured in a
Dockerfile and runnable in a Docker containerAn existing project that does this already.
Host on S3/Cloudfront
Save money by hosting on S3 and only pay for the actual bandwidth used. Currently I use a webhost, whom I have no complaints about, but it’s excessive to have a whole VM that just serves static files.
Also look into using Cloudfront to cache the static pages for even faster pageloads and lower bandwidth.
git-lfs files on S3
It’s unsettling to use
git-lfs and have files stored in an vaguely-documented “github server”. While I’m using GitHub it makes sense to also use GitHub for large file storage.
It would be trivial to copy the repo to an S3 bucket, but then I need find a way to redirect
git-lfs to also use the same S3 bucket.
git-hook to automatically publish
Publishing the website and pushing to Git are separate operations, which can lead to various out-of-sync states.
git-hook would not only automate this process to ensure everything is always in sync, it would also enforce VCS best practices, such as not committing half-completed changes to
master - a branch should be used instead. And don’t publish from branches.
Where possible, it is best to avoid
HTML and use native Pandoc Markdown to support as many output formats as possible. Some features of Tufte CSS do require
HTML, like wrapping tables and videos in
Full documentation is available from the Pandoc Markdown manual, the examples below are included for convenience or to capture notes about non-obvious behavior.
Syntax-highlighted code with line numbers.
Epigraphs/blockquotes require a wrapping
<div class="epigraph"> tag.
Note: The blank
> line between quote and footer is required. Otherwise the blockquote won’t be wrapped in
<p> tags and it will not constrain itself to the column width.
A newthought starting a new section.
Any content that appears before the first header needs to be manually wrapped in a
<section> First content in a page before a header. </section>
VideoJS is used to create an inline, streaming video playerI found VideoJS after a quick search, there may be better players but VideoJS was quick to integrate and easy to use.
header-includes attribute to the content
YAML header to load the video CSS.
Add the video player to the content where you want the video to appear.
Load the video player script at the end of the content.
Tables should always be wrapped with
<figure> tags to ensure they are sized and re-flow correctly.
A full-width table with headers.
|Col 1||Col 2|
|some content||more content|
A full-width table without headers.
|some content||more content|
Table of Contents
toc-title attribute in the
YAML header to automatically generate a table of contents at the top of the page.
- Run arbitrary commands when files change.
- Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.
- GitHub is how people build software.
- HandBrake is a tool for converting video from nearly any format to a selection of modern, widely supported codecs.
- If you need to convert files from one markup format into another,
pandocis your swiss-army knife.
- Pandoc Markdown
- Pandoc understands an extended and slightly revised version of John Gruber’s Markdown syntax.
- Convert Pandoc Markdown-style footnotes into sidenotes.
- Automating the use of pandoc.
rsyncis an open source utility that provides fast incremental file transfer.
- Tufte CSS
- Tufte CSS provides tools to style web articles using the ideas demonstrated by Edward Tufte’s books and handouts.
- Tufte Pandoc CSS
- Starter files for using Pandoc Markdown with Tufte CSS.
- Video.js is an open source library for working with video on the web, also known as an HTML video player.