tadhg.com
tadhg.com
 

sfmagic.org Rewrite: N-Squared Problem

06:33 Thu 27 Dec 2007. Updated: 00:06 18 Jan 2008
[, , ]

It took me less than an hour to go from the finished player page to a working basic version of the standings page (yay Python/Pylons/Genshi!). However, when running all the players (193 in the data set I’m using) against the entire set of results (2787 rows in this data set) it took more than thirty seconds to produce the page.

Eventually I plan to put the standings in their own database tables after they’ve been generated, so thirty seconds on an old laptop isn’t a show-stopper, but I did think it was quite slow and decided to see if I could improve the performance at least at little.

At first I thought that running an SQL query per player was the problem, and so reworked it to run one query to get all the results followed by going through the results for each player in Python. That didn’t help.

Doing it in Python rather than SQL didn’t change the fact that it’s searching through 2787 rows 193 times.

The Python I was using was a simple for loop, and my first attempt to improve performance was to put the matched rows into another array and then remove them, so that the number of rows would decrease every time. This made performance significantly worse—while the overall set of results grew incrementally smaller over time, time through the for loop was doubled because first it would have to match against the player id to find the right rows, and then it would have to go through again to match the rows to remove them.

My final attempt got page load time down to about ten seconds. I did this by ordering the list of players by player id, ordering the results by player id, then having the for loop iterate through, break as soon as it encountered a player id that didn’t match, and then delete all the rows that did match. So for each player, the first n rows and only those rows in results would match their player id.

Ten seconds is still much longer than I would like, but it’ll be faster on the server it ends up on, and I will hopefully be generating those results only once per week and putting them into their own table(s).

This is the approach I use, with some pieces cut out for clarity:

#(results is sorted by results_playerid)
def summarize_player(player):
    playerid = player[0]
    remove_rows = []
    
    for i, row in enumerate(results):
        this_row = {}
        if row['results_playerid'] != playerid:
            break
        for col in row.keys():
            if col == 'tournaments_date':
                this_row[col] = row[col].strftime('%a %d %b %Y')
            else:
                this_row[col] = row[col]
        sorted_r.append(this_row)
        remove_rows.append(row)
    
    if len(remove_rows) > 0:
        for row in remove_rows:
            results.remove(row)
    
    #(At this point I go through sorted_r to create a summary of the player's results as in yesterday's post)
    
    return player_summary

standings = []

#(allplayers is sorted by playerid)
for player in allplayers:
    summary = summarize_player(player)
    standings.append(summary)

standings.sort(key=operator.itemgetter('points', 'max_point_percentage', 'game_win_percentage', 'tournament_wins', 'tournaments'), reverse=True)

One Response to “sfmagic.org Rewrite: N-Squared Problem”

  1. Brett Says:

    I’m confused why this is an n-squared problem. Can’t you just make an list whose size is the number of players, and stash the relevant data for each player in there? So you would sweep over the results, and when you see a result for player 37, then you stash that row into the 37th element of your list. When you’re done, you’d have a list of lists, e.g. [ [row10, row28, row33], [row3, row90], … ].

    Then you sweep over this player list and process the records pertaining to each one. At the end of the day, you’ve only looked at each row twice. (And really, some of the summarizing could probably be accumulated in the player list during the first pass.)

    If you don’t have a convenient index for each player, you could use a dictionary of lists, which would just be O(n lg m), where m is the number of players.

Leave a Reply