Related Posts Plugin and SQL Trickery

11:55 Tue 10 Feb 2009
[, , , , ]

As I said I would, I’ve made a WordPress plugin out of the ‘related posts by tag’ functionality that I use on this site.

It’s mostly very simple: all it does is look at the tags the current post has and display other posts that have overlapping tags, in order of degree of overlap.

The only tricky part was figuring out how to determine that overlap. My initial thought was that I would start with the complete set of tags for the post, then run a query that would see if any posts matched, then go through each set of the complete set minus one tag, and so on, e.g. if it were this post, I would first run a query to check whether or not any posts had the tech, WordPress, coding, and PHP tags, then check for posts with tech, WordPress, and coding; WordPress, coding, and PHP; tech, WordPress, and PHP, and tech, coding, and PHP—and then on through the sets of two, with a separate query for each one. This process would stop as soon as I hit whatever predefined limit was configured (ten posts, for example).

This wasn’t an elegant fix, but I didn’t see a way around it. However, I ran into an immediate problem, which was: how could I figure out whether a post had the tags I was checking for?

Simplifying slightly, the post/tag relationships are stored in a table like this one, where letters are tag IDs and numbers are post IDs:

tag_id post_id
A 1
A 2
B 1
C 1
B 5
C 5
C 7

So post 1 has tags A, B, and C. A also tags post 2, post 5 has B and C, etc.

With that structure and using MySQL, how do you figure out what posts have a given set of tags? You don’t know the post IDs in advance. I had trouble with this, ended up trying a number of complicated nested SELECT statements, asked a few people, and didn’t get a good answer until I asked Al Marks, who came up with a good answer fairly quickly, one which used a rather different approach—instead of asking “which tags have all three relationships?”, it assigns a point value for each match:

GROUP BY post_id

Extremely simple: each match against any of the tags is worth a point; only return anything with more than zero points.

Moreover, this approach means that the whole looping I was going to have to do in PHP could be eliminated, and MySQL could return the right ordering as well:

SELECT *, SUM(CASE WHEN tag_id in (A, B, C) THEN 1 END) AS matchnum FROM table
GROUP BY post_id
HAVING SUM(CASE WHEN tag_id in (A, B, C) THEN 1 END) > 0
ORDER BY matchnum DESC, date DESC;

(I know there’s no date in the example table, but I didn’t want to complicate the above code with joins.)

That’s it; that does more or less all of the difficult work. The select statement in my widget looks like this:

SELECT $wpdb->posts.*, SUM(CASE WHEN $wpdb->term_relationships.term_taxonomy_id in (" . $comma_separated_tags_to_match . ") THEN 1 END) AS matchnum from $wpdb->posts
LEFT JOIN $wpdb->term_relationships ON ($wpdb->posts.ID = $wpdb->term_relationships.object_id)
WHERE $wpdb->posts.post_status = 'publish'
AND $wpdb->posts.post_type = 'post'
AND $wpdb->posts.ID != " . $post_id . "
GROUP BY $wpdb->term_relationships.object_id
HAVING SUM(CASE WHEN $wpdb->term_relationships.term_taxonomy_id in (" . $comma_separated_tags_to_match . ") THEN 1 END) > 0
ORDER BY matchnum DESC, $wpdb->posts.post_date " . $options['dateorder'] . "
LIMIT " . $options['fetchlimit'] . ";

I’m that SQL pattern will prove handy in future, as well—thanks Al!

Apart from that, the only real obstacles concerned trying to make classes work with a WordPress structure that doesn’t seem to expect to deal with them. To get around this I ended up creating some functions that did nothing but wrap calls into class functions.

4 Responses to “Related Posts Plugin and SQL Trickery”

  1. Richard Says:

    Tadhg, this is very neat plugin and really captures the importance of relationship in post content. I like the way you have used wp core features like tags to do this, as opposed to other related posts plugins that look at analysing content.

    Two points I want to ask you to do with presentation:

    1- can you make a template so that the plugin can display a list in a page (say after the post)?
    2- showing the first image as a thumbnail, then the Post title then a content snippet would be nice.

    I run a magazine site and one thing I need is to catch reader attention visually and these points would help. Many thanks…

  2. Shota Says:

    I have been looking for this widget so long.
    so, thanks for the great widget.

    I have a question, can I get rid of the date next to he title of each post?

  3. Tadhg Says:

    Shota: I just updated the plugin, and the new version includes a field (in the widget control) that lets you format the date. If you leave this field blank, it won’t display the date at all.

  4. Tadhg Says:

    Richard: I just updated the plugin, and the new version includes the ability to show the list of related posts at the end of a page, and/or at the end of a post’s entry in the RSS feed, in addition to or instead of the widget in the sidebar. See for details.

Leave a Reply