Creating XHTML Blog Entries with Emacs and HTMLTidy


Jon Udell has been experimenting with XHTML content in his blog and RSS feeds for some time now and gotten some interesting results. This line of inquiry is interesting to me as well, but I haven't had time to play too much with the actual processing. What I have started doing, however, was ensuring, as much as possible, that the content of my posts is in proper XHTML form. Here's how I do that.

First, I write all my posts in Emacs before I post them using Radio. I'd given up on Radio's built-in browser-based editor long ago and have been using Emacs since the mid-80's, so I found it refreshing to have a real editor at my disposal. Its a little more overhead to fire up Emacs and then cut and paste it into Radio, but not much. The HTML Helper mode in Emacs aids with tags and so on, so that's a net increase in productivity. All in all, I think I'm better off in Emacs.

As I write, I try to write proper XHTML, but its easy to make mistakes, so I use a tool inside Emacs to help: HTMLTidy. HTMLTidy is a neat little program that will tidy-up your HTML and make it valid 3.2, XHTML, whatever. I have the following in my .emacs file:

(global-set-key "\\C-xt" 'tidy-region)
(setq shell-command-default-error-buffer "tidy-errors") ; define error buffer
(defun tidy-region ()
  "Run Tidy HTML parser on current region."
  (interactive)
  (let ((start (mark))
        (end (point))
        (command "/usr/local/bin/tidy -config /Users/pjw/config/tidyconfig.txt -asxhtml"))
        (shell-command-on-region start end command t t
             shell-command-default-error-buffer)))

This isn't original with me. I found it with Google, but I can't remember where now. At any rate, all it does is process the region with HTMLTidy and replace it with the result from HTMLTidy. The contents of the tidyconfig.txt file are:

markup: true
tidy-mark: false
clean: false
show-body-only: true
gnu-emacs: true
output-xhtml: true

This isn't all I need to do, of course. I'd like to clean up my templates so that they use CSS instead of nested tables, but that's not been high on priority list yet--too many short deadline projects plus a book project will do that.

What I'd really like to do is to produce everything I write in XHTML, but that's not practical for a variety of reasons, including the need to change control and notification 'ala Word for group edited work. Still, I hate that I write a lot of content in Word that's format-locked. Maybe WorkML will help, but I'm not holding my breath.