tadhg.com
tadhg.com
 

Python Optimization Tips

06:34 Tue 05 Jan 2010. Updated: 09:41 14 Jan 2010
[, ]

I came across these on Hacker News recently, and think they’re worth calling out: Python Speed Performance Tips.

This comment is also worth reading.

There’s plenty of food for thought in those. They both encourage the use of list comprehensions, which I happen to like quite a lot.

On that note, I was recently asked about doing some contrived simple programming problems in list comprehensions.

  1. Find the overlapping elements in two lists:

    [item for item in list1 if item in list2]
    

    This can also be done using sets:

    set(list1).intersection(list2)
    
  2. Find the first repeating character in a string:

    [char for char in s if s.count(char)>1][0]
    

    Alec points out a far superior method in the comments:

    next(c for i,c in enumerate(s) if c in islice(s, i+1)
    
  3. Reverse a string (couldn’t do this in less than 3 lines until I remembered list slices):

    s[::-1]
    

4 Responses to “Python Optimization Tips”

  1. Alec Says:

    … but as I’ve often said, I don’t think there’s always glory in doing things in one line. :)

    For instance, your first repeating character is a guaranteed O(N^2) operation because s.count() is O(N) – for small strings this is fine but it would be dangerous to do this on a very large list.. You want to stop as soon as you find the first element, and make sure never to count() the characters leading up to the character you’re on.

    There are going to be tricks to do that in a list comprehension, but then the list comprehension gets ugly, fast, and a for loop is easier to understand.

    For instance:

    % python -m timeit -s 'import string; s = string.uppercase * 1000' '[c for c in s if s.count(c)>1][0]'
    

    10 loops, best of 3: 1.78 sec per loop

    % python -m timeit -s 'import string; from itertools import islice; s = string.uppercase * 1000' 'next(c for i,c in enumerate(s) if c in islice(s, i+1))'
    

    100000 loops, best of 3: 4.32 usec per loop

    Note that the latter is literally 412,037 times faster for a string of 26000 characters.

    But even for small strings it’s a win.

    % python -m timeit -s 'import string; s = string.uppercase * 3' '[c for c in s if s.count(c)>1][0]'
    

    10000 loops, best of 3: 53.1 usec per loop

    % python -m timeit -s 'import string; from itertools import islice; s = string.uppercase * 3' 'next(c for i,c in enumerate(s) if c in islice(s, i+1))'
    

    100000 loops, best of 3: 4.26 usec per loop

    Note that the 2nd loop is virtually the same speed no matter the length of the input string.

    In the end though, I suspect a for loop with a ‘break’ would be comparably fast to the latter, and much easier to read. (It’s not at all obvious to me that using next() and islice() is what makes this fast…)

  2. Tadhg Says:

    Thanks for the feedback! You’re completely correct about s.count(c), that approach is inefficient. I’m wondering about how much less efficient it is than the full for loop approach I had before turning it into a one-liner, though. I should time that and find out…

    Also, do we need islice()? Your approach is the right way to go, but I think it can be done as:

    python -m timeit -s 'import string; s = string.uppercase * 1000' 'next(c for i,c in enumerate(s) if c in s[i+1:])'
    

    That registers as about a second faster than the islice version on my machine.

    In general, I like the one-liners if they’re readable, and agree that they can get out of hand… however, I do enjoy the puzzle aspect of them, and trying to solve them is usually illuminating (as in this case!) even when you might ultimately reject them for production code.

  3. Tadhg Says:

    Hmm, testing that again today, maybe it was a once-off, because now my version is slower… I’m a little surprised that islice(s, i+1) is that much faster than s[i+1:], but it appears that using islice is indeed one of the main factors improving speed here.

  4. Alec Says:

    The reason islice is faster is that it is lazy. If you take s[1:] of a 20000 character string, it makes a copy of 19999 characters even if you only look at the first 5. If you use islice, it only copies/extracts the characters you look at. Neat, huh?

Leave a Reply