tadhg.com
tadhg.com
 

Moving From Word Processors to reStructuredText

23:54 Sun 12 Jul 2009. Updated: 17:17 28 Dec 2009
[, , , , ]

I’ve written before about my wish for semantic word processing tools, and two years on I still haven’t found something that suits me. I think that WYMeditor has definite promise, but unfortunately the authors are aiming that at browser-to-server functionality, rather than in-browser standalone functionality. This isn’t such a major obstacle for me, but it is one of the reasons why I’m hesitant to move over to using a project that hasn’t reached version 0.5 yet.

I’ve been aware of reStructuredText for a little under two years, and have used it on and off—more off than on, mainly. This morning, however, while talking to a friend about the desirability of moving away from not merely proprietary formats but formats that weren’t text-based, I finally hit the point where I no longer want to use RTF.

RTF has been my file format for most personal and creative writing since I stopped using Microsoft Word format about seven years ago. I experimented with other formats and decided on RTF as the least-bad one. I’ve thought about alternatives more or less ever since, but never found any that worked.

This is partly due to my wanting something that supports semantic meaning while also not requiring me to use heavyweight markup. I don’t want to be distracted by the markup issues while writing, so I want something as close to plain text as possible. Furthermore, I’m simply used to the word processing environment, having written in it for decades.

I didn’t want to give up visible italics and bold, and I didn’t want to give up the feel of the serif font face and the overall “softer” environment.

However, I’ve always done my blog writing in a text editor, and since I’ve written a fair amount of fiction for the blog, I’ve gradually become more used to the text editing environment for more creative and personal work.

One of the things that kept me from making the shift away from the word processor was the hope that eventually I’d find one that was just right. It might still happen, but today it occurred to me that in the interim, it would be best for me to have my files in a rational format. That rational format is not RTF, but I’m hoping it’s reStructuredText.

The shift is both about moving from RTF to reStructuredText and about moving from a word processor to a text editor. In theory I could write reStructuredText in a word processor, but that doesn’t make much sense to me.

This also means that I’m going to gradually shift all of my files over to reStructuredText. I’ve been converting old .doc files to .rtf, and now will both move files from .doc to reStructuredText and move the converted .rtf files to reStructuredText… making me wish I’d made this decision quite some time ago, but that’s all right.

I recommend reStructuredText pretty highly. You can go from it to more or less any other format, and it suports rather a lot of functionality. I’ll probably try to write a quick primer on it at some point, as most of its documentation is aimed at programmers rather than general users, but I think anyone who deals with text could probably benefit by using it. (Well, it, and a decent text editor, and a decent version control system…)

4 Responses to “Moving From Word Processors to reStructuredText”

  1. Christopher Lee Says:

    Hi Tadhg,
    what is your pipeline from WORD to reST? I haven’t found a good tool yet. I just spent some time trying to get the Python module pyrtflib to work on some of my old RTF files, but it crashed on every single file I tried. Furthermore, it would be even better to be able to go straight from WORD .doc files to reST. Do you know a way to do this?

    By the way, if you haven’t already done so I would urge you to manage all of your writing projects under git, the distributed version control system. It is life-changing, and as they say “it’s not just for breakfast (source code) anymore!” If you are using reST, git will work beautifully, showing you exactly what changed in each “commit”, enabling you to merge different “experimental versions” of a project completely automatically, etc. etc. Ask me if you have questions…

    – Chris Lee

  2. Tadhg Says:

    Chris: Thanks for the comment! Unfortunately, my methodology for going from Word to reST involves a lot of manual steps. I use OpenOffice.org to open the Word files, then some find-and-replace to e.g. surround italics with asterisks, then copy the file and run a minor conversion script (separating paragraphs with empty lines and some other cleanup), and then manual verification.

    If I were to try to get closer to a proper automated process, I would probably use OpenOffice.org in conjunction with unoconv to go to some other format, and then from that format to reST. For example, unoconv outputs to DocBook, and there’s a DocBook to reST script. I haven’t tried it yet because installing lxml on this machine is more hassle than I’ve so far wanted to deal with, but it looks like a promising route for the future.

    I currently do all my version control with Subversion, and have considered git but am more likely to move to Mercurial, partly because I’ve used it a bit already and partly because of my bias towards Python. git versus Mercurial aside, however, I agree wholeheartedly with the version control evangelizing!

  3. Christopher Lee Says:

    Hi Tadhg,
    thanks for pointing me to unoconv. The basic idea looks great — command-line / automated extraction of document content out of old WORD files etc. The problem is that OpenOffice’s promised support for all this appears to be totally broken, at least in the Mac version. After fooling around for some time with the pyuno library included in the OOo 3.1.0, I realized it is linked against Python 2.3 in the system framework… which hasn’t existed since Mac OS 10.3?! I have Mac OS 10.5, which only includes Python 2.5 installed in the System frameworks. OOo is supposed to ship with its own python executable that will actually work with pyuno… but again that is missing in action (no such file anywhere in the install), at least in the Mac version. So you are blocked either way. Trying to install your own Python 2.3 would probably not solve the problem, because pyuno is linked specifically against Apple’s Python2.3 framework libraries. I am downloading the latest (3.1.1) OOo release candidate but I have a sneaking suspicion it will be the same… linked against Python 2.3 libraries that existed back in 2003!

    I think everyone should use both git and mercurial, so they can access either type of repository. Like you I am a committed Pythonista, so I assumed I would use Mercurial instead of git. But I got sucked into git in one of my projects, and it’s been great. Mercurial is excellent too. So learn both! But whatever you do, move out of SVN and try out the new magic as soon as you can. You won’t regret it!

    – Chris

  4. Tadhg Says:

    Chris: I found a fairly solid toolchain.

Leave a Reply