tadhg.com
tadhg.com
 

A Little More Functional Programming

23:55 Thu 25 Feb 2010. Updated: 01:41 26 Feb 2010
[, ]

After reading Dhananjay Nene’s comment on my post about a functional style approach to the “find longest repeater” problem, I decided to follow the line from that comment and divide the program into functions for finding the longest contiguous block and then for comparing the blocks. Naturally, I wanted to do this without using any variables…

It wasn’t too hard to write a variable-free function to return a list of blocks of letters from a string, e.g. continuous_blocks("anna") produces ["a", "nn", "a"]:

def contiguous_blocks(string):
    def listbuilder(blocks, char, rest):
        if not rest:
            return add_to_blocks(char, blocks)
        return listbuilder(add_to_blocks(char, blocks), rest[0], rest[1:])

    def add_to_blocks(char, blocks):
        if blocks and char in blocks[-1]:
            return blocks[:-1] + ["".join([blocks[-1], char])]
        return blocks+[char]

    return listbuilder([], string[0], string[1:])

And then findlongestrepeater becomes:

def findlongestrepeater(string):
    def compare_blocks(previous, current):
        return current if len(current)>len(previous) else previous

    return reduce(compare_blocks, contiguous_blocks(string))

Note that this approach provides a clear use for reduce, which is on the outs in Python these days.

A crucial difference between Dhananjay Nene’s version and mine, however, is that mine returns a list and not a generator. So I decided to write a variable-free generator version… and just couldn’t do it. I don’t know what it was, but I kept getting hung up on trying to track the previous, current, and next characters, and to do that without variables while trying to skip at the right times in the for loop just seemed impossible. While writing this blog post, however, I figured it out. I had intended to put it up as a puzzle for other people, so I guess I’ll put my solution at the end. If you plan to try it, don’t scroll beyond the warning below.

There’s an easier way, too, which is simply to do this:

def contiguous_blocks(string):
    from itertools import groupby
    for k,g in groupby(string):
        yield u"".join(list(g))

I found that last night after being unable to bend my mind in the right way, because I was sure that this problem couldn’t be so rare that itertools wouldn’t have something to handle it.

However, I felt like that was cheating… so I kept trying to build my own variable-free generator.

Because of the fact that reduce isn’t going to be in Python 3, I decided I’d write a version of it that would work as its replacement for this problem. It didn’t seem too hard to come up with this:

def reducereplacement(func, generator):
    def runfunc(product, gen):
        try:
            return runfunc(func(product, gen.next()), gen)
        except StopIteration:
            return product

    return runfunc([], generator)

This only works with generators and not with lists, unlike the real reduce. There’s no reason to use this, either, given that reduce is in Python 2 and will be in the functools module for Python 3; I just wanted to try it as an experiment.

So, to restate the puzzle problem:

Write a version of contiguous_blocks that is functionally equivalent to the version above using itertools.groupby, without using the assignment operator anyhere, or anything from a module. This is a puzzle I found impossible last night but reasonably easy today. Like a lot of puzzles, looking at it in a slightly different way can make all the difference—advice which is often maddeningly useless while you’re trying to solve something.

The function, plus the version of findlongestrepeater I have above, should pass the following tests:

def test_contiguous_blocks():
    testgen = contiguous_blocks("googlee")
    assert testgen.next() == "g"
    assert testgen.next() == "oo"
    assert testgen.next() == "g"
    assert testgen.next() == "l"
    assert testgen.next() == "ee"
    assert [c for c in contiguous_blocks("google")] == ["g", "oo", "g", "l", "e"]
    assert [c for c in contiguous_blocks("googlee")] == [
        "g", "oo", "g", "l", "ee"]

def test_findlongestrepeater():
    assert findlongestrepeater("google") == "oo"
    assert findlongestrepeater("gooooogle") == "ooooo"
    assert findlongestrepeater("eeffi") == "ee"
    assert findlongestrepeater("abcdef") == "a"
    assert findlongestrepeater("eeffeeeffeeeee") == "eeeee"

WARNING, SOLUTION

Don’t read past here if you want to solve it yourself, obviously.


The key insight that helped me get it (which seems utterly obvious now, of course) is that you’re not limited to the current and next (and/or previous) characters. I’d been approaching it with logic like this:

  • Check the character to see if it’s different from the next character.
    • If it is, return it.
    • If it isn’t, pass.
  • But also (somehow) store it so that you know what the current sequence of repeated characters is, and when you next return something, return that sequence instead of just the current character.

I couldn’t figure out how to make that work, and it may be impossible to get Python to do that without using variables (although maybe I just missed the trick).

However, that wasn’t the right way to look at it. With access to the index of the current character, you can get not only the next character, but also string up to that point. That was the realization: the logic should be as follows:

  • Check the character to see if it’s different from the next character.
    • If it isn’t, pass.
    • If it is, return all the characters in the string up to that point that are the same as the character.

As I said, that seems awfully obvious now…

This is the solution I came up with:

def contiguous_blocks(string):
    def inarow(row, string):
        if not string or row[0] != string[0]:
            return row
        return inarow(u"".join([row, string[0]]), string[1:])

    for i,c in enumerate(string):
        if i >= (len(string) -1) or c != string[i+1]:
            yield inarow(c, string[:i][::-1])

So, if the character is different from the next one, or you’ve run out of string, yield the result of passing the character and the (reversed) string up to that point to inarow. Otherwise, if the character is the same as the next character—do nothing.

inarow doesn’t have to track anything but the characters until a different character shows up or you run out of string, so it returns what it has at that point if either of those happen, otherwise it calls itself with a new row of characters (the current one tacked onto the end) and the rest of the string.

The combination of find_longest_repeater and contiguous_blocks seems like a fairly elegant solution. Not only does it come in at two lines shorter than my version from Tuesday, but it also has the benefit of being more neatly divided in logical terms. (Which is something I wouldn’t have seen without the suggestion that the determination of contiguous blocks could be a discrete function.)

One Response to “A Little More Functional Programming”

  1. Barak Says:

    So I didn’t like the Scheme implementation last time — your reduce observation is great, however, and leads me to improve it to this:

    ;whee, Scheme comments start with semicolons
    ;See findlongestrepeater above, adapted for Scheme
    (define (longest-rep str)
      (let ((x (string->list str)))
        ;foldl is python's reduce. it's a functional language staple.
        (foldl compare-blocks '() (contig-blocks x))))
    
    (define (compare-blocks prev cur)
      (if (> (length prev) (length cur)) prev cur))
    
    
    ;Recursive rewrite, python in comments
    ;def contig-blocks(l):
    (define (contig-blocks l)
      ;if l[1:] == []:
      (if (null? (cdr l))
          ;return [l]
          (list l)
      ;else:
          ;r = contig-blocks(l[1:])
          (let ((r (contig-blocks (cdr l))))
          ;if l[0] == r[0][0]:
            (if (eq? (car l) (caar r))
              ;return [[l[0]] + r[0]] + r[1:]
                (cons (cons (car l) (car r)) (cdr r))
          ;else:
              ;return [l[0]] + r
                (cons (list (car l)) r)
                ))))
    

    Note that I don’t wholly agree with your rule for not using assignment operators. As I use it above, it’s merely a shorthand. Anywhere I use “r” as a variable, it could be replaced by the recursive call. And that’s what makes it functional — literally, in the mathematical sense of having one and only one returned value for the same input — time and state invariant, and zero side-effects.

Leave a Reply