tadhg.com
tadhg.com
 

What I want from Blogging Software

23:03 Sun 06 Oct 2013
[, , , , , , , , , ]

I’ve grown increasingly unhappy with WordPress, despite the fact that it’s served me fairly faithfully for over seven years. The main reason is performance—this blog is now just too slow to load. There are definitely things I could do to tackle that, but having to do so is a sign that it’s not the right platform. The other reason is philosophical—I no longer think that a web application backed by a database is the best approach for a blog.

I’ve been thinking about writing my own—of course[1]. So first I should establish the requirements.

Text File Backend

Text files are easy to maintain, back up, and otherwise deal with. Categories and tags and lists of related posts are easier to manage with a database, but given that they should change infrequently, it’s not really defensible to run a database continually and query it live for those things.

The text file backend would have to support reStructuredText as an input format. In a sense, my current blog meets this requirement, as I write in reStructuredText and have a toolchain that handles posting to it directly from Vim. But that’s a process that should be simpler.

Flat File Delivery

When you visit most blogs (including this one), unless the page you’re visiting has been cached by some intermediary, something like the following steps occur:

  1. Your request is turned into a series of parameters by the underlying web server and handed off to the blog software.
  2. The blog software identifies the specific content you want.
  3. The blog software pulls that content out of the database.
  4. The blog software assembles a page around that content (pulling in header, footer, sidebar, etc.).
  5. The blog software pulls any further dependencies from that page (e.g. supplying the titles and links of related posts) out of the database.
  6. The blog software turns all of those components into an HTML page.
  7. Your browser receives the HTML.

Given that people visit the site far more often than I update it[2], it’s rather inefficient to do all that work every time. Using something like WP Super Cache is possible—it turns a WordPress site into a lot of flat files—but I don’t want to maintain that on top of WordPress, I want that approach to be the core of the blog platform itself.

Instead of the above steps, I want these:

  1. Your request is turned into a series of parameters by the underlying web server and handed off to the blog software.
  2. The blog software reads the page corresponding to your request off of disk.
  3. Your browser receives the HTML.

Separation of Form and Content

The text content has to mix with HTML, CSS, and JavaScript at some point to produce output. I want this separation to be clear, and I also want it to be easy to change the look of the blog easily. This means I want support for templating (both for the HTML and the CSS) as part of the build process, and I want the build process to be easy to alter and to add pre- and post-processing steps to.

Git Integration

This follows fairly neatly from all of the above. If the content and the templates are all flat files, it should be easy to track them in a version control system. I might want four different repositories, though:

  • The writing and other content, possibly including comments.
  • The source files for the layout.
  • The rendered flat files—so that the site itself is backed up independently of changes to the build framework, and so that restoring the site is merely a question of deploying a checkout.
  • Copies of downloaded external content as part of handling linkrot.

Comment Support

This is the first requirement that doesn’t appear to be addressed by blog software out there—like Pelican, for example, which would be on my list of candidates. The other flat-file approaches I’ve seen rely on third-party centralized comment handling, mostly Disqus. I don’t want to give control of my comments to a third party, however, and would want easy integration with some comment service that I host myself, such as perhaps Discourse (which I haven’t looked at closely yet).

Comment support is difficult for a flat-file approach because it requires a server to be running to accept comments. It also brings with it the spam problem. I want tight integration with the blog software, too, so that older comments become part of the content the site is built around, and newer comments are pulled in via JavaScript[3]. Along with spam management comes the necessity of an interface for comment administration. While it makes sense that the comment support would be a different service, the blog software needs to be able to write to and read from that service.

I also want support for showing blog-wide recent comments.

Reasonable URLs

ID numbers are not good enough anymore, and haven’t been for about a decade.

I probably also want short URL support.

404/Redirection Support

All content that’s ever been hosted on your site should be accessible via any URL that it’s ever been accessible by, unless you’ve deliberately chosen to delete that content. Changing software on your side is not a good enough excuse for breaking links.

Currently, my blog has a 404 handler that has a set of key-value pairs, where the key is the old URL (some going back a very long time) and the value is the new URL. So while earlier versions of my site used completely different software and a completely different URL scheme, those links still work. Any new platform would have to support this.

It would also have to support delivering a reasonable 404 page (something my site doesn’t actually do very well right now, but that’s my fault and isn’t because of WordPress).

Categorization

Requesting posts by tags would need to be supported, and I would like support for things like categories, “series”, dates (day, month, year) also. This gets a little expensive for flat-file systems because they have to build HTML for each of the sets, and if you allow combinations of tags that’s more expensive again. I would be tempted to consider some kind of lightweight server to handle tag combination, but it would have to be integrated into the system. Another possible approach is to handle it entirely client-side: the blog platform provides a JavaScript object with all of the post URLs and titles and tags, and JavaScript in the browser figures out which match the request.

Metadata

The software would have to know how to read other metadata, such as timestamps, and also arbitrary things I might come up with later.

Linkrot Detection/Prevention

Links cease being accurate with appalling frequency, and that can make some blog posts incomprehensible later on. (This seems to be even more true of YouTube clips than other content.) So I want blog software that notes what links are in posts and downloads their content as part of the blog’s assets, and later does automatic link-checking. If a link has disappeared, then it should take some action to allow the stored version to be used instead of the now-dead link.

RSS

This includes options for including full content in RSS feeds, and being able to include arbitrary other content (such as links to related posts) in the RSS, and also RSS feeds for the comments on individual posts.

Code Syntax Highlighting

Code snippets should have language-specific CSS applied to them.


Eventually I’ll find time to determine whether I could make Pelican, or some alternative, do the above (or enough of it), and then I’ll switch over. This blog certainly needs a technical overhaul and a redesign, and I’m not willing to do the latter until I do the former—partly because terrible performance is more important than subpar design.

I might still want some kind of administrative interface, but that would be a separate service, one that would write to the repositories and kick off rebuilds, rather than something coupled with the blog content. One key use of that interface would be to allow scheduling of future posts.

[1] I last did that over a decade ago, in PHP backed by MySQL. Any new project would not be written in PHP.

[2] At least, that’s my assumption…

[3] So that if the site only gets rebuilt when a new post is made, the sequence would look like this:

  1. Article A posted.
  2. Site built.
  3. Comment 1 made on article A.
  4. Whenever article A is requested, comment 1 only shows up after the page is loaded, because it’s requested by JavaScript at that point.
  5. Article B posted.
  6. Site built.
  7. Comment 1 is now part of the content for article A.
  8. Whenever article A is requested, comment 1 shows up along with the rest of the content, rather than being added by JavaScript later.

One Response to “What I want from Blogging Software”

  1. Elena Says:

    It seems that pelican already has a (3rd party?) plugin that manages comments almost in the way you want them, by using discus, but fetching old comments and including them in the pages at generation time https://github.com/getpelican/pelican-plugins/tree/master/disqus_static

Leave a Reply