tadhg.com
tadhg.com
 

Quick & Dirty Book Info Lookup

21:57 Sun 06 Mar 2011
[, , , , , ]

I’m still trying to cut down on the number of books I have in my apartment. That still feels wrong, but the shift to ebooks is making it a little easier. Now I’m getting rid of books that aren’t big favorites of mine, weren’t given to me as gifts, and aren’t in the poorly-defined category of “classics I want to keep”.

Because I’m a pack rat and a data geek, I have a hard time getting rid of books if I haven’t recorded the metadata about them I want to record. Unfortunately, I’m not always diligent about noting that info as I read the books, so the majority of the books I wanted to give away or sell were books where I hadn’t done so—and I really didn’t want to go through them one by one.

Modern technology to the rescue…

Normal people might have just gone with Delicious Library, but I’m also trying not to introduce any more dependencies to proprietary software into my life, and besides want text output in a very specific format, so instead I used:

All mixed together with a Python script I put together for this purpose. It’s pretty ugly… but it did the job.

This is the workflow:

  • Edit an empty note in Catch on the N1 and hit the icon that looks like a QR code to invoke Barcode Scanner.
  • Scan a book’s ISBN; it returns to the note once the ISBN is read.
  • Repeat the above two steps with as many books as desired (I did this in groups of about 10).
  • Save the note and synchronize the Catch account.
  • Go to computer and go to Catch.com.
  • Copy the ISBNs to the clipboard.
  • Paste them into MacVim and turn them into a comma-separated list (a trivial operation, this).
  • Copy the list to the clipboard.
  • Run the Python script from the terminal, passing in the list as an argument.
  • Get formatted book data.
  • Do minor cleanup (there’s always something).

I wanted to use the Google Book Search Data API, but it didn’t return all the info I wanted, most notably the number of pages. After some looking around I determined that Amazon’s service did provide that information, so I used that instead; this did mean I had to sign up for a developer API key, but you can do that using an Amazon account, which I had already.

The bottlenose library is excellent for talking to the Amazon APIs, and this would have been a much longer endeavor without it. I used pyisbn to try to distinguish between ASINs and ISBNs, but it wasn’t absolutely necessary.

It worked, and I’ll keep using it as I try to whittle my library down, and to make tracking of my reading easier in future.

The key part of the script is the part that does the lookup, which I’ll include to show how easy it is. Even at this size it’s already inelegant—it looks things up in the US region, assumes any errors are due to books not being found in that region, and then tries the UK afterwards; what it should do is verify the correct error, and then recursively go through all the regions in a specific order for all the arguments that generated that error. Nevertheless:

def lookup(self):
    """
    Creates the connection to Amazon and stores the returned data as dicts.
    """
    amazon = bottlenose.Amazon(self.ACCESS_KEY, self.SECRET_ACCESS_KEY)
    style = "http://xml2json-xslt.googlecode.com/svn/trunk/xml2json.xslt"
    retries = []
    def az_lookup(a_id):
        try:
            typ = pyisbn.validate(a_id) and "ISBN"
        except ValueError:
            typ = "ASIN"
        az_args = {
            "ItemId": a_id,
            "Style": style,
            "ResponseGroup": "Large",
            "IdType": typ
        }
        if typ=="ISBN":
            az_args["SearchIndex"] = "Books"
        raw = json.loads(amazon.ItemLookup(**az_args))
        if "Item" not in raw["ItemLookupResponse"]["Items"]:
            if "Errors" in raw["ItemLookupResponse"]["Items"]["Request"]:
                #Assume it’s non-USA ISBN
                retries.append(a_id)
            return None
        book = {
            "ASIN": raw["ItemLookupResponse"]["Items"]["Item"]["ASIN"]
        }
        book.update(
            raw["ItemLookupResponse"]["Items"]["Item"]["ItemAttributes"])
        return book

    self.data = [a for a in [az_lookup(a_id) for a_id in self.idlist] if a]

    if not retries:
        return

    amazon = bottlenose.Amazon(self.ACCESS_KEY, self.SECRET_ACCESS_KEY,
                               Region="UK")
    retried = [a for a in [az_lookup(a_id) for a_id in retries[:]] if a]
    self.data = self.data + retried

One Response to “Quick & Dirty Book Info Lookup”

  1. monsun Says:

    I have a final proof that you are crazy. ;)

Leave a Reply