tadhg.com
tadhg.com
 

sfmagic.org Database Structure

15:24 Fri 11 May 2007
[, , , , , , ]

In the eight months since writing Some Plans for sfmagic.org, I’ve made little progress. This is mainly because of gating factors such as moving it into subversion and getting my database backups into subversion as well. Now that those things are in place, I need to review how it works right now to figure out how to proceed.

The site is shockingly slow, and poorly organized, and (horrors!) isn’t valid HTML. All that needs to change.

Why is it slow? It’s primarily due to one of the queries it’s doing. Overall, the purpose of the site is to keep statistics for the sfmagic Wednesday night draft group, which I’ve been part of for almost four years. It’s quite comprehensive except for not tracking head-to-head results, which I would like to add, but which would make both reporting and tracking more difficult.

The site database has four tables, one of which (rares) is used for convenience only and doesn’t really matter for the statistics tracking. The other three are players, tournaments, and results. They have the following fields:

players:

  • id
  • name
  • dci

The first two are self-explanatory; the last one is the number used by the game’s organizing body to track rating, and I don’t really use it for anything, since none of our players have expressed much interest in it.

tournaments:

  • tournament_id
  • date
  • format
  • sets
  • superset
  • bestof
  • num_players
  • type
  • seeding_type

This table holds the information about the tournaments apart from the results from those tournaments. Mostly self-explanatory, but format means whether it’s draft, constructed, or some other style of playing MTG; sets refers to which sets are legal in the tournament; superset refers to the “parent set” for the tournament, e.g. Time Spiral for Time Spiral/Planar Chaos/Future Sight drafts; bestof is the number of games played in a match (usually one or three); type is the type of tournament structure, usually either Swiss or Round-Robin; seeding_type is the method used to assign players to draft groups—none, random, scatter, or clump.

This is the entry for the draft on Wednesday night:

20070509a	2007-05-09 19:30:00	Limited: Booster Draft	TSP/PLC/FUT	TSP	3	7	Swiss	Scatter

results:

  • playerid
  • matches_won
  • matches_lost
  • matches_drawn
  • points
  • deck
  • first_pick
  • tournament_id
  • resultid
  • rank
  • draft_position
  • games_won
  • games_lost
  • games_drawn
  • decklist_url
  • report_url

This table holds the data about the individual results. Each row in this table represents the result of a single player in a single tournament. For example, my result from Wednesday night:

23	2	0	1	7	White/green	Damnation	20070509a	20070509a1	1	5	5	1	0

To get, for example, the standings for the year 2006, I ask the database for all the results associated with tournaments that occurred in 2006, grouping the results by player name. That works reasonably well.

The real sluggishness arises when I do the more complicated query necessary for our seeding rankings: each player’s last ten tournaments, aggregated, with a negative modifier for missing the last n tournaments. The modifier is the easy part, it’s that first part that strains the server, because it requires a query for each player (currently there are 193) to get that player’s last ten tournaments and to aggregate the results from them. That’s the part that slows the server to a crawl, and it will only get worse as we get more players (and keep entering new results).

That’s what I need to fix first, and my current plan is to create a new table, standings_seeding, that has the following fields:

  • name
  • player_id
  • tournaments_played
  • tournament_wins
  • points
  • matches_won
  • matches_lost
  • matches_drawn
  • games_won
  • games_lost
  • games_drawn
  • tournament_win_percentage
  • max_points_percentage
  • match_win_percentage
  • game_win_percentage
  • date

This is, its fields would be the same fields as the current results page shows in output, plus a date. So it would essentially act as a cache for the results. The improvement would be in my running the entire query for its population only once, and then only updating it on a per-player basis when a player participates in a tournament. Running eight or so queries at a time is a lot cheaper than all 193 of them.

2 Responses to “sfmagic.org Database Structure”

  1. JC Says:

    Instead of de-normalizing your data, you can simply add a column to the result table that indicates whether or not a record should be used to calculate the seeding. This column would obviously need to be updated to have only the last 10 entries marked for each player when a new record is added for that player, either with a trigger or in code. But this would be no different than your proposed table.

    No more sub-selects, no more performance problems.

    BTW, this is one area where PostgreSQL really shines over MySQL. Version 8.x has a 10X performance benefit over version 7.x in some of the more complicated queries that we run involving sub-selects. Yes, that’s ten times faster.

  2. Tadhg Says:

    Thanks for the suggestions! The new table, while denormalized, isn’t replacing the other tables, which will still be up to date, and is really acting more like a cache. Your suggestion is a good one, but doesn’t work for some of the other things I have in mind (but didn’t mention in the article), such as being able to easily search for e.g. “who has been first in the seeding standings for the most weeks?” and “who has been first in the seeding standings for the most consecutive weeks?”. Also, I want historical data to be cheaply accessible (e.g. “who was first in the seedings halfway through 2006?”).

    As for PostgreSQL, I’ll definitely consider switching to that. When I started, MySQL was really just a default for me, and I’m not attached to it in any particular way. I don’t know how hard it would be to shift over, but it’s something I’ll probably try at some point.

Leave a Reply