23:39 Thu 06 Mar 2008
[, , , , , , ]

Despite having worked at Metaweb for almost a year, and despite my OCD tendencies, I had avoided getting sucked in by the allure of correcting/completing/entering data in Freebase, the web frontend to our attempt at structuring all the world’s information. I had avoided it until today, that is.

But today, I sent out an invite to a bunch of people inquiring whether any of them wanted to see Thief, Michael Mann’s early-80s crime thriller, with me at the Castro on Tuesday. I included a link to the Freebase page for Thief in the email.

Looking at it myself, I noticed that Tangerine Dream did the soundtrack. Looking at them, I noticed that one of their members during the 80s was Paul Haslinger, whose name I recognized because I have the soundtrack for Underworld.

That’s when the descent really began… I noticed that the sixth track, “Red Tape”, was listed as “Red Tape – Agent Provocateur”, and all the tracks were listed as by Haslinger. I know that Agent Provocateur recorded that track, but I wasn’t sure whether or not the Recorded By property of Musical Track was the right place to enter that.

After some consultation with the people who do know, it turned out that it was. So I entered that info, and renamed it to “Red Tape”.

I suppose I could have left it there, but no. The page for topics is new, a recent change, and it’s far superior to the old version. And it sucked me in, so that I looked at the Underworld movie page and noticed a few things were missing, and decided to add them.

After doing that I noticed that only the US film rating (R) was present, while the IMDB page had ratings for a lot of countries. When I was about to enter some of those, I realized that Freebase didn’t have a lot of those other ratings (meaning, the topics for the ratings themselves, not which ones had been assigned to Underworld).

I started to add those. I found myself adding the German ones first. Here there was a pause as I attempted to use our “list pusher” tool to add a pile of them at once, with their ancillary data. That didn’t work, for a variety of technical reasons… we have a more complicated “data pusher” tool, but sadly I wasn’t able to figure out how to get that to do what I wanted, and the in-house expert on it wasn’t at their desk when I went looking. So after a fruitless detour (but one that’s made me determined to understand that tool in future), I ended up entering the German film rating data myself.

The page for the German film ratings started off missing more or less everything except the article blurb, which we grabbed from Wikipedia. The fact that it is a film content rating system, and an organization, are relationships that I added. I also added what else it’s known as.

I added some of this while adding the film ratings that it gives out as topics themselves. One of the very simple/basic yet amazingly great things about Freebase is that if you add a realtionship to thing A saying that thing B is related to it, thing B then shows that relationship as well. Sounds like it should be that way, but compare to Wikipedia, where in order to get the same effect, someone has to edit the pages for both thing A and thing B. When you started large-scale edits, it’s no longer a trivial deal to have to support those kinds of relationships. Freebase just takes care of that, because that’s one of the things it was built to do.

After adding the various German film ratings, and also classifying the SPIO‘s relationship to the FSK and adding some details there too, I ran into a bug on the live site that I was able to fix quickly and place in a patch already on its way. Nice to find a bug like that while using the site as a user, though.

After that, I let myself get sucked back into film ratings, and decided that since I’d entered the German ones I might as well do the Irish film ratings too. From film ratings to the Censorship of Publications Board, from there to (I’m not joking, this was a bad time in Irish history) the Committee on Evil Literature.

A detour led me to discover that Samuel R. Delany‘s Mad Man was banned in Ireland—in 1997 (and is still banned). That info isn’t available in Freebase, sadly, and I’m not sure whether or not all books, films, etc. need a Censored By property (but it’s tempting).

A few more relationship edits meant that after I was done, the Committee on Evil Literature was the only sub-agency listed for the Department of Justice, Equality, and Law Reform. That’ll get rounded out eventually, but I don’t think it’s too bad an outcome for now.

From there I ended up looking at and cleaning up Kevin O’Higgins, then establishing a relationship between him and the Vice-Presidency of the Executive Council of the Irish Free State, then entering the rest of those office holders (plus dates), and then on to the Presidents of Ireland.

Okay. I stopped there, but it took an effort. And there were a few other info-entering sprees along the way that I left out.

Now, I’m starting to think about how some of this stuff should be represented a little better—for example, I think that the Cabinet, as an Irish Government Body, is necessary, and that Ministers should be members of that body. If all that data were entered and hooked up, and the politicians in question also properly represented, you could start doing more interesting queries, like: how many Irish Cabinet members were not Fianna Fáil while in office? How many were never in Fianna Fáil? Not earth-shattering info, but the kind of basic list stuff that might be interesting… and that’s also not so easy to do without a system that understands links as Freebase does.

Anyway, it’s clearly gotten its hooks into me, and I think a lot of that has to do with the new topic view, which makes viewing complicated data far easier, editing it far less painful, and correcting/filling it out more immediately rewarding. If you’re a data, info, or stats geek, or you have some niche area you’re into, it would be worth it to have a look.

(As for me, I’m eventually going to put an attempt at a canonical schema for MTG cards, and related info, in there—it’s a more difficult schema to get right than it might appear. Until then, I’ll probably dabble in random other data.)

Leave a Reply