Python Optimization Tips
.I came across these on Hacker News recently, and think they’re worth calling out: Python Speed Performance Tips.
This comment is also worth reading.
There’s plenty of food for thought in those. They both encourage the use of list comprehensions, which I happen to like quite a lot.
On that note, I was recently asked about doing some contrived simple programming problems in list comprehensions.
-
Find the overlapping elements in two lists:
[item for item in list1 if item in list2]
This can also be done using sets:
set(list1).intersection(list2)
-
Find the first repeating character in a string:
[char for char in s if s.count(char)>1][0]Alec points out a far superior method in the comments:
next(c for i,c in enumerate(s) if c in islice(s, i+1)
-
Reverse a string (couldn’t do this in less than 3 lines until I remembered list slices):
s[::-1]
05 Jan 2010 at 10:47
… but as I’ve often said, I don’t think there’s always glory in doing things in one line. :)
For instance, your first repeating character is a guaranteed O(N^2) operation because s.count() is O(N) – for small strings this is fine but it would be dangerous to do this on a very large list.. You want to stop as soon as you find the first element, and make sure never to count() the characters leading up to the character you’re on.
There are going to be tricks to do that in a list comprehension, but then the list comprehension gets ugly, fast, and a for loop is easier to understand.
For instance:
10 loops, best of 3: 1.78 sec per loop
100000 loops, best of 3: 4.32 usec per loop
Note that the latter is literally 412,037 times faster for a string of 26000 characters.
But even for small strings it’s a win.
10000 loops, best of 3: 53.1 usec per loop
100000 loops, best of 3: 4.26 usec per loop
Note that the 2nd loop is virtually the same speed no matter the length of the input string.
In the end though, I suspect a for loop with a ‘break’ would be comparably fast to the latter, and much easier to read. (It’s not at all obvious to me that using next() and islice() is what makes this fast…)
05 Jan 2010 at 18:03
Thanks for the feedback! You’re completely correct about s.count(c), that approach is inefficient. I’m wondering about how much less efficient it is than the full for loop approach I had before turning it into a one-liner, though. I should time that and find out…
Also, do we need islice()? Your approach is the right way to go, but I think it can be done as:
That registers as about a second faster than the islice version on my machine.In general, I like the one-liners if they’re readable, and agree that they can get out of hand… however, I do enjoy the puzzle aspect of them, and trying to solve them is usually illuminating (as in this case!) even when you might ultimately reject them for production code.
06 Jan 2010 at 05:40
Hmm, testing that again today, maybe it was a once-off, because now my version is slower… I’m a little surprised that islice(s, i+1) is that much faster than s[i+1:], but it appears that using islice is indeed one of the main factors improving speed here.
09 Jan 2010 at 10:39
The reason islice is faster is that it is lazy. If you take s[1:] of a 20000 character string, it makes a copy of 19999 characters even if you only look at the first 5. If you use islice, it only copies/extracts the characters you look at. Neat, huh?