tadhg.com
tadhg.com
 

Better Word Count in jEdit

22:33 Sun 19 Jul 2009
[, , , , , ]

I tend to care about word count in my writing. I’ve never been paid by the word, but nevertheless, it matters to me. From time to time I write fiction where I set the word count in advance, and then I try to hit it precisely. Even when that’s not the case, I just like to know how many words there are in a piece I’m writing. For this reason, a "word count" function is completely critical to me for whatever word processor or text editor I’m using to write.

jEdit has such a feature. It’s more or less the same as the one that I’ve been using in AbiWord, and in various word processors before that. But for quite some time I’ve wanted a better word counter. Since jEdit is now my application for all writing and I can script for it in Python, it was time to make the word counter I wanted.

The problem was fairly simple: most applications only consider a few characters to be word separators, typically the line-end character, the space, and the tab. I think I’ve encountered some word processor that let you add more separators yourself, but I can’t remember what it was (possibly an earlier version of OpenOffice.org.

What about dashes, though? The phrase "double standard—I don’t" has four words in it, but almost all applications would report that as being three words long. Similar instances include "either/or", "in&out" (granted, there really should be a space there, but it’s still not one word), and "thirty–forty minutes" (the elusive en dash makes an appearance). Save for that one word processor I can’t remember, no tool would account for those correctly.

Furthermore, the double hyphen is used often as a stand-in for the em dash, and the ellipsis is used without spaces from time to time (not a strictly correct practice, but sometimes style takes precedence), and I wanted a word count that would correctly deal with all of these eventualities.

Here it is, as a jEdit macro written in Jython (meaning it requires the JythonInterpreter plugin):

def word_count():
    """
    jEdit macro for better word count.
    """
    import re
    from org.gjt.sp.jedit import Macros

    LINE_SEPARATORS = (
        "\r",
        "\n"
    )

    WORD_SEPARATORS = (
        " ",        # space
        "\t",       # tab
        "/",        # slash
        "&",        # ampersand
        u"\u2013",  # en dash
        u"\u2014",  # em dash
    )

    REPEATER_SEPARATORS = (
    #These are only separators if they're present consecutively, e.g. -- or ..
        "-",
        "."
    )

    # get local reference to textArea
    textArea = view.getTextArea()

    chars, words, lines = 0, 0, 0
    selection = textArea.getSelection()

    if len(selection) == 0:
        text = textArea.getText()
    else:
        text = textArea.getSelectedText()

    chars = chars + len(text)
    lines = lines + 1

    #go through the text character by character:
    word, previous_character = 0, None
    for character in text:
        if character in (LINE_SEPARATORS + WORD_SEPARATORS) or (character in REPEATER_SEPARATORS and previous_character in REPEATER_SEPARATORS):
            #it's a separator
            word = 0
            if character in (LINE_SEPARATORS):
                lines = lines + 1
        else:
            #it's part of a word.
            if not word:
                words = words + 1
                word = 1
        previous_character = character

    message = "Characters: %s\nWords: %s\n Lines: %s" % (chars, words, lines)
    Macros.message(view, message)

if __name__ == '__main__':
    word_count()

Leave a Reply