Speeding Up Tags


A while back, I added a tag cloud to my blog. The idea was to replace categories with tags, a much more flexible system. I bend the Movable Type (MT) keywords and search to my purpose. One thing I did to make that work was modify the search script in MT to search keywords exclusively when it's called with the SearchElement=keywords option.

My next task, which I describe here, had three goals:

  1. Make something with a prettier URL
  2. Add RSS for tags
  3. Speed things up

The last point was important if I wanted this to work at any kind of scale. MT's search feature isn't speedy by any means and there's no way I could add RSS, with all the beauty of pull delivery, without solving number (3) as well.

There was a pretty simple answer to all of these goals: wrap the search function with some code to make it all work. I wrote a small function called tags in perl and installed it with the following declaration in the httpd.conf file:

  <Location /tags>
    SetHandler cgi-script
    Options +ExecCGI
  </Location>

With this in place, I could parse the tag from the path info to get URLs that look like this:

http://www.windley.com/tags/blogging

The wrapper just builds the old, crufty search URL and then calls it with a GET. Why use a GET? Loose coupling. I've found that I sometimes later moving things around and if everything is based on HTTP it just keeps working.

I found a template from Naill Kennedy that creates RSS for MT searches, but because MT assumes everything is HTML, it returned the wrong content type. The wrapper solved this problem as well since I could return any content type I wanted. Here's the RSS feed for that same tag:

http://www.windley.com/tags/blogging?flavor=rss

RSS is just another flavor on the tag. I like that. Now, tags are syndicated using RSS. That's important because I use category RSS feeds to drive some of the other things I do. For example I automatically grab all my posts on IT Conversations from the last week each Monday to include in the IT Conversations newsletter. Having the RSS feed makes that simple.

The speed problem is solved by having the wrapper cache results. If a file exists for the tag and it's less than a certain number of seconds old (currently I have it set to 6 hours), then the contents of the file are returned. If not, it performs a search. Whether it builds a search URL for HTML or RSS, the result is stored in a file for re-use. This makes most repeat searches very fast and put an acceptable load on the processor.

The code is available for you to look at. Let me know if you use it or improve it.