Last Saturday I needed to clean up some HTML (that I'd read into Scheme as a string) into valid XML for storing in Sleepycat's DbXml database. HTML Tidy is a great way to do this, so I put together a small, single-function extension to the Tidy library for PLT Scheme.
The library's easy to build and use. Here's an example:
(require (lib "tidy.ss" "tidy")) (define bad_string "<p>Foo!<ul><li>first<li>second") (display (tidy:string bad_string))
<p>Foo!</p> <ul> <li>first</li> <li>second</li> </ul>
I've run several hundred HTML snippets that I've gotten out of RSS feeds through the function over the last week and it's worked great.