tadhg.com
tadhg.com
 

Fun with pandoc, Vim, and email

23:52 Fri 21 Jan 2011. Updated: 01:35 22 Jan 2011
[, , , , , , , , , ]

I’ve mentioned pandoc once before, and it’s again proved rather useful. I’ve been looking for more ways to use it, as I love its core principle (although I naturally wish that it focused on reStructuredText rather than Markdown) of being a comprehensive text format converter. It might at one point be the answer for getting from reST to PDF—something that the current reST tools don’t help me with because I insist on using Unicode, and XeTeX isn’t yet supported. But today pandoc helped with a different task: going from reST to plain text.

You might wonder why I would want to do that. After all, reST is very close to plain text as it is, and is almost as readable (part of its purpose is to be so). Why would I need a converter from it to plain text?

Because of email. I’ve found myself composing quite a few emails in Vim (because I do almost all composition in Vim), and then having to manually change things to accommodate the email format. The main issue is with lists and linebreaks; I write reST with infinitely wide columns, but email should be 72 columns wide. Normal linebreaking in Vim or in email clients won’t preserve the margins.

For example, a list:

  • List item one, which is really really long, ridiculously long, so long that you, the reader, might think that I, the author, am belaboring things somewhat just to support the technical point I’m making.
  • List item two, here purely to round out the numbers.

In my source, that’s two lines, one per item. Wrap it to 72 characters, however, and you get something rather ugly from Vim’s built-in wrap function:

+ List item one, which is really really long, ridiculously long, so long
that you, the reader, might think that I, the author, am belaboring
things somewhat just to support the technical point I’m making.  + List
item two, here purely to round out the numbers.

(Note that you get far better results if you use “-” as your bullet character instead of “+”, but I don’t like to be forced into using that character alone, and am pretty used to “+” at this point.)

Thunderbird’s built-in wrap function does slighty better:

+ List item one, which is really really long, ridiculously long, so
long that you, the reader, might think that I, the author, am
belaboring things somewhat just to support the technical point I’m
making.
+ List item two, here purely to round out the numbers.

pandoc -f rst -t plain, however, does rather better:

-   List item one, which is really really long, ridiculously long,
    so long that you, the reader, might think that I, the author, am
    belaboring things somewhat just to support the technical point I’m
    making.
-   List item two, here purely to round out the numbers.

That’s just what I want; wrapping that respects things like lists. Unfortunately pandoc doesn’t yet support more advanced reST features like tables and footnotes, but I don’t use those too often in emails anyway.

The only problem I had with it is that very occasionally I do want to use emphasis in emails, *like so*. Since “*” is the emphasis character in reST, that works fine… except that pandoc strips out all emphasis when it outputs plain text, which makes sense.

That, however, I could get around during the pre-pandoc step, in which I replace Unicode characters with ASCII transliterations, since if you really want to rely on it, email is still ASCII. The Python unidecode module takes care of that quite nicely—em dashes become two hyphens, curly quotation marks become straight quotation marks, and so on.

Putting it together, I have a quick Vim command that takes the current buffer, applies the above transforms, yanks the buffer to the system clipboard, and then restores the original version, so that all that’s then required is a quick switch to Thunderbird followed by paste.

The Vim script, which is written in Python, looks something like this:

def email(self):
    import re
    from unidecode import unidecode
    t = self.get_text(whole=True) #Get the buffer.
    regex = re.compile("\*([^ ]{1})([^\*]+)\*")
    r = regex.sub("\*\\1\\2\*", t) #Replace * with \*.
    self.write_text(unidecode(r)) #Replace the buffer with the unidecoded text.
    vim.command(":%!pandoc -f rst -t plain") #Replace the buffer with the output pandoc produces from it.
    vim.command(’normal ggVG"*y’) #Yank the buffer into the system clipboard.
    self.write_text(t) #Restore the original buffer.

self.get_text() and self.write_text() are Vim utility commands I’ve put together, the internals of which are a little more complex than I’d like and too much to go into here. My version has a couple of other complications, but I think the above code (with replacements for getting and writing the buffer text) should work from any Python run within a recent Vim.

One last bonus is that pandoc appears to respect “<” and “>” as delimiters meaning “don’t break this into separate lines”, so surrounding URLs with those should work in the source document.

« (previous)
(next) »

One Response to “Fun with pandoc, Vim, and email”

  1. David Says:

    Just stumbled across this. Instead of converting from rst to plain, try converting from rst to rst:

    pandoc -f rst -t rst

    That should get you nicely formatted lists, without losing emphasis. You might even go so far as to put something like this in your vimrc:

    setlocal equalprg=pandoc\ -f\ rst\ -t\ rst

    Then you can select a chunk of text and hit `=` to tidy it up.

Leave a Reply