tadhg.com
tadhg.com
 

Refactoring, Geeky Enjoyment, and Readability

12:16 Mon 05 Apr 2010
[, , ]

Refactoring is rewriting source code while preserving the functionality of that code. I’m currently refactoring my Python script for Subversion status, because I want to experiment with sharing code on bitbucket and it seemed like a reasonable first project to put up there.

Some people hate refactoring, but I often enjoy it. I get a kick out of figuring out how to make code “better”, although that’s often a subjective judgment. In this case, when I opened up the script to have a look at it, I immediately saw a function that I knew I wanted to refactor. I’m amused by the enjoyment I derived from making the fairly simple change.

The function gets the length of the longest filename in a list[1], returning 0 if for some reason there aren’t any filenames in that list. This was the original version:

def get_filename_width(self, files):
    width = 0
    lines = files.split("\n")
    for line in lines:
        if len(line) > width:
            width = len(line)
    return width

It expects the files variable to be a multiline string. So it assigns a value to width, then splits the string into individual lines, and goes through each line, testing its length against the value of width and making width into that value if it’s greater. After going through all the lines, it returns width.

It’s fine as a first pass, but looking at it now, it just seems clunky and too long. I rewrote it into this:

def get_filename_width(self, files):
    return max([len(line) for line in files.split(u"\n")]) if files else 0

Six lines into one. But is this too opaque?

If I’m the intended reader, clearly not, as I’m quite familiar with reading list comprehensions, and I also consider the Python 2.5+ ternary expression to be entirely legible.

If someone else is the reader? The ease with which they can break the line into chunks is the main issue, as first they have to see that the line returns the result of max() if files has some value and 0 otherwise. The next bit is seeing that max() operates on a list comprehension, and finally they have to see that the list comprehension creates a list of the lengths of the lines in files.

I can see the argument that the six-line version is clearer, but the fact that it’s six lines as opposed to one needs to be taken into account as well—more lines mean more cognitive overhead for the reader, something that is often underestimated in judging the readability of code.

I think that a reader familiar with Python would grasp the one-liner faster than the six-liner. The use of max is helpful here (whereas it might hinder a reader who doesn’t know it)—rather than having to discern what the for loop in the six-liner is doing, it’s clear that the one-liner is interested in getting the highest value. I personally think that the lack of variable assignment in the one-liner makes it easier as well, despite the familiarity that almost all readers would have with variables.

Clearly, this is a rather trivial example[2]. But the issues are relevant to writing shared code, and “shared” often applies even to personal code if you’re ever going to revisit it. While I definitely want to get something working quickly, it’s important to balance that with writing readable software. Readability requires finding the line between elegance and impenetrability, between concision and obscurity—just like writing more generally.

[1] Yet it’s called get_filename_width—because it’s concerned with the width (in columns) that the longest filename will take up in a terminal window. The purpose of the function in the larger context is related to the concept of width, while in the function itself it’s dealing with the length of individual filenames. You could argue for changing the name to get_filename_length, but I think the overall purpose of the function is more important to its name than what it’s doing internally.
[2] It took me less than 3 minutes to write the new version of the function, whereas this post discussing it has taken rather a lot longer than that…

Leave a Reply