tadhg.com
tadhg.com
 

Applying Metadata

23:28 Mon 09 Apr 2007. Updated: 18:56 10 Apr 2007
[, , , ]

As I’ve been continuing the process of moving files into my Subversion repository, I’ve decided on metadata to use with each file.

Metadata is extremely useful, since it’s obviously good to know things about files that aren’t necessarily contained within the files themselves. The first piece of data that comes to mind is date, because after files are moved around a bunch, recreated in different formats, moved in and out of repositories, and so on, they don’t usually still have their original timestamps (unless you’ve been really careful to preserve those, and I wasn’t). So relying on the metadata contained in the filesystem for date (or much else) isn’t a great idea.

Beyond that, there are plenty of other useful bits of metadata. The tagging used on this blog (and in many, many, many other places) is an obvious example: you want to mark things as being in particular piles so that you can find them in those piles later.

As I’ve decided on (X)HTML as my document format, I use the meta element to store most of the metadata (obviously enough). These elements, and their contents, don’t show up in the document when it’s viewed, and are also part of a well-understood standard. And while the meta elements are now mostly ignored by search engines because of abuse, this doesn’t matter to me for my documents as I don’t care about search engine ranking and just want standards-based metadata.

In addition to these, I also use a more structured standard for metadata, the Dublin Core. This is an attempt at a comprehensive metadata standard for more or less all documents, and has wide acceptance online.

Examples of what I’m using are below, but it should be clear that documents need metadata, and that if you’re going to add metadata to documents you should really use a widely-recognized standard, as this will make life easier for you in the long run (and life easier for anyone looking for works you make public). I’m not aware of any alternatives to Dublin Core that seem as well-thought-out, or that look like they’ll have as wide acceptance, so I recommend using that standard.

An example of the standard HTML metadata from one of the files I’ve converted recently:


<meta name="abstract" content="Argument that postmodernism's declaration of the end of art is overblown." />
<meta name="author" content="Tadhg O'Higgins" />
<meta name="copyright" content="This work is licensed under the Creative Commons Attribution-Share Alike 3.0 License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA." />
<meta name="description" content="Essay on postmodernism." />
<meta name="expires" content="Never" />
<meta name="keywords" content="philosophy, literature, art, postmodernism, literary theory" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

These are mostly self-explanatory, although “abstract” versus “description” isn’t spectacularly clear. The former is supposed to be longer, and I suspect that my content for it tends to be too short. The last element tells the browser that the file is HTML in text format and that the character set is UTF-8, but is probably redundant as the first two lines of each file (the XML and DOCTYPE declarations) do the same job.

An example of the Dublin Core metadata from the same recently-converted file:


<meta name="DC.Title" content="The postmodern culture of parody is the end of creative art." />
<meta name="DC.Creator" content="Tadhg O'Higgins" />
<meta name="DC.Subject" content="philosophy, literature, art, postmodernism, literary theory" />
<meta name="DC.Description" content="Argument that postmodernism's declaration of the end of art is overblown." />
<meta name="DC.Publisher" content="http://tadhg.com/" />
<meta name="DC.Contributor" content="" />
<meta name="DC.Date" content="1994-01-04" />
<meta name="DC.Type" content="Text" />
<meta name="DC.Format" scheme="IMT" content="text/xhtml OR text/xml OR text/html OR application/xml (see below)" />
<meta name="DC.Identifier" content="https://svn.tadhg.com/personal/academic/ba/philosophy/2nd_year_philosophy_essay_1_postmodern_parody_end_of_creative_art.html" />
<meta name="DC.Source" content="https://svn.tadhg.com/personal/academic/ba/philosophy/2nd_year_philosophy_essay_1_postmodern_parody_end_of_creative_art.rtf" />
<meta name="DC.Language" content="eng" />
<meta name="DC.Relation" content="" />
<meta name="DC.Relation.IsFormatOf" content="https://svn.tadhg.com/personal/academic/ba/philosophy/2nd_year_philosophy_essay_1_postmodern_parody_end_of_creative_art.rtf" />
<meta name="DC.Relation.References" content="urn: isbn: 0091731607" />
<meta name="DC.Coverage" content="" />
<meta name="DC.Rights" content="This work is licensed under the Creative Commons Attribution-Share Alike 3.0 License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA." />
<meta name="DC.Rights.License" content="http://creativecommons.org/licenses/by-sa/3.0/" />
<link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" />
<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />

For more information, see the Dublin Core Metadata Element Set 1.1, the DCMI Metadata Terms, and the Usage Guide, particularly the Simple HTML Examples and Qualified HTML Examples sections.

“DC.Title” is the same as the HTML title (and is probably redundant, but I suspect it won’t do any harm). I use “DC.Subject” identically to “keywords”, but in theory it’s supposed to use a controlled vocabulary.

“DC.Format” is supposed to be a valid MIME type, but I’m using text/xhtml anyway because, well, files in plain text with markup are text in my view, and not application data, so I object to using application/xml and/or application/xhtml+xml. text/html is most likely what the browser will read it as… and is also what I claimed it was in the last standard meta element above. So I’m clearly being inconsistent there. text/xml is another possibility, one that I’ll have to look into. There are all kinds of tricky content-negotiation issues here that don’t matter for local files but would matter for consistency if the files were online.

Now I need to add this kind of metadata to my blog entries, too…

Leave a Reply