tadhg.com
tadhg.com
 

Text Advocacy

14:11 Sun 09 Sep 2012
[, , , , , , ]

I use plain text formats for all of my writing, and you should at least consider doing the same.

By “plain text” I mean not only a text (as opposed to binary) file format, but also something that is plainly readable when simply listing the contents of the file—that is, a format you don’t necessarily need a specific tool to read. Such formats are more flexible, more robust, more malleable, and more future-proof than more complicated alternatives.

I started out with Microsoft Word[1], which is still the tool most people think of when they want to do some writing. I became increasingly unhappy with Word as time went by: incompatibilities between different versions, difficulty tracking changes, and moving away from proprietary software were all factors that pushed me away. While there are now many open-source applications that can read Word formats very well[2], my experiences with it taught me that it’s not wise to depend on a single application to read your file. Files are already fragile enough, and if you have to open yours with a particular application, that means you’re depending on the survival of both the file and the application.

So what are plain text formats? Plainest of them all is text with no markup at all: just a text file, usually with a .txt extension, and no emphasis, or bold, or links, or other features. This is reasonable for certain writing styles. It’s a little too plain for me, however, and I eventually settled on reStructuredText as my lightweight markup language. A lightweight markup language is designed to:

  • Be easy to write by humans, e.g. to emphasize text you simply surround it with asterisks, e.g. *emphasized*—something that people did before the existence of these lightweight markup languages.
  • Be easy to read in its raw form, unlike for example HTML, which isn’t that easy to read in source because of how intrusive its markup is.
  • Be easy to export to other formats, usually HTML but also others like PDF, RTF, and LaTeX.

The most popular of these are probably Markdown and, in a different way, the MediaWiki format, used by Wikipedia. I like reStructuredText, but they all have advantages and disadvantages, and choosing one is mostly a matter of personal preference.

I write in UTF-8, so I have access to nice typographic quotation marks and dashes, as well as many accented characters, and I highly recommend using UTF-8 for everything regardless of other format considerations.

Here are some reasons why you should consider using plain text formats:

Future-Proofing

As already mentioned, your writing is more fragile if you need not only the files but a specific application to read it. Plain text files, whether in ASCII or UTF-8, are going to be readable for a very long time. Simply moving to a new computer, even on a new operating system and/or architecture, is very unlikely to prevent you from reading your files. This is in stark contrast to binary formats, particularly proprietary ones—I have Word documents that Word will no longer read[3].

If you’re serious about your writing or consider it to be important, you should want it to be available to you in the future. Using a plain text format makes this far more likely. Furthermore, if you use a plain text format and also rely upon various other tools, the disappearance of those tools will be a nuisance but not a disaster—it will be far easier to find replacements for their functionality than it would be to find something that handles an old propriety document format[4].

Tool Variety

This is related to the last point, but goes beyond it: there are innumerable programs out there designed to work with plain text. Vast numbers of open source applications exist to perform various weird and wonderful text-based tasks. If one doesn’t do precisely what you want, it’s far easier to make one that does for plain text than for some other format. Even if you’re not a programmer, this matters—getting someone else to write an application that does what you want will be much easier for plain text files.

This also allows for greater access to your writing in unusual situations; if I want to check something in my writing from a command line, perhaps remotely, then I can do so without much trouble—without having to worry about having a specific application available in whatever environment I’m currently in.

Separation of Form and Content

This starts out as a disadvantage for most users, perhaps because it introduces a layer of abstraction, but once grasped it’s very powerful. The “what you see is what you get” paradigm tends to foster a reliance on the specific editing tool, as well as a mixture of content and form concerns during the writing process—for example, being distracted by font selection choices, or by formatting the appearance of a bullet list.

Using a plain text format removes those distractions from the writing phase and places them in the output phase, where they properly belong. While I write everything in reStructuredText, when I share my writing with others (such as in this blog post, or at work), it’s as HTML or PDF. I still tinker with what that output looks like, but that tinkering occurs after the writing is done.

The notion of the plain text as “source” also makes it easier to see that different output formats may have different requirements, and gets away from the unhealthy notion that every published version has to look exactly how it looks in the WYSYWIG view the author is using to write it.

Elimination of form concerns make it easier to focus on the writing itself, during both composition and editing phases.

File Size

While this is less of a concern than it once was, given how cheap storage is now, it still matters in some instances. A Word document of a day’s morning pages is three to four times as large as its reStructuredText equivalent; when I create 365 of those per year, that difference adds up. And there are still low-bandwidth situations where that matters.

Version Control and Tracking Differences

You should be using version control (as well as a robust backup strategy). Version control systems work best with plain text files, because they were created to handle the plain text source files of software projects. They also make it easy to see differences between versions—if you’re using plain text. Seeing the changes between one version of some of my writing and another is trivial. If I’m looking at older work that’s still in Word format, however, it’s a lot harder—and what I usually do is convert the files to text first, then use the tools that work on plain text.

Searching

Command line tools for searching through text files are mature, robust, and commonplace. Again, it’s far easier to search through large numbers of plain text files than it is through their binary equivalents.

Simplicity

Alongside all of the above, there’s also the argument of simplicity. Text file formats are smaller because they’re simpler. Why not use a highly efficient format for storing information? What use is the extraneous information stored in a Word file that makes it three to four times larger? I don’t think I’ve yet come across any need, in years of daily writing and blogging, that isn’t met by reStructuredText but would be met by word processing software. Since that’s the case, why bother with the extra bloat?

It’s highly unlikely that your writing absolutely requires support for more than various kinds of emphasis, hyperlinks, lists, footnotes, citations, simple tables, images, headings, comments, and sections—if it even requires all those. Yet word processors are hefty, complex beasts that require at least as much effort to learn as it would take to master those features in Markdown, reStructuredText, or AsciiDoc. Simplicity means easier maintenance, less overhead, and—in this case—ultimately more power and flexibility with your words.

[1] I did some work in AppleWorks and Microsoft Works before that—and yes, I did have problems getting those documents into Word format.

[2] Better than modern versions of Word can, in fact.

[3] Although OpenOffice.org can read them.

[4] Such as the ancient AppleWorks files I still have—although luckily I did convert those to Word at some point in the past and can read the Word versions.

Leave a Reply