tidy - A Scheme Extension for Tidy

tidy is an extension for PLT-Scheme that provides a single function interface to the HTML Tidy library. Tidy checks and corrects HTML, ensuring that it's valid. As implemented, this extension provides a singe function, tidy:string that takes a string representing malformed HTML and returns a string representing valid XHTML.

Building the Extension

  • Download and install HTML Tidy. Tidy has a console and a library. This extension uses the library.
  • Download and unpack the tarball.
  • Compile the tidy.c file using the following command:
     mzc --cc ++ccf -I/usr/local/plt/include tidy.c
    
  • Link the resulting .o file into a dynamic library using the following command:
    mzc --ld tidy.dylib ++ldf  -L/usr/local/plt/lib/ ++ldf -ltidy tidy.o
    
    Note that this is what I use on OS X. Most systems use .so instead of .dylib as the extension for the dynamic library.
  • Copy the dynamic library to your collects folder.
    cp tidy.dylib ${collects}/tidy/compiled/native/ppc-macosx/tidy.dylib
    
    ${collects} is the location of your collects folder. If you're not on OS X, you'll need to substitute the right thing for ppc-macosx.
  • Test the extension by running
    mzscheme -f tidy-test.scm
    

Limitations

I've only implemented a single function. Other functions that tidy files, ports, etc. could be implemented.

I've hardwired all the options that I like into the function. An interface for specifying Tidy options from within Scheme would be nice.

Documentation

Here's an example of using the tidy library:

(require (lib "tidy.ss" "tidy"))

(define bad_string "<p>Foo!<ul><li>first<li>second")

(display (tidy:string bad_string))

This displays

<p>Foo!</p>
<ul>
<li>first</li>
<li>second</li>
</ul>


Last Modified: Friday, 17-Jun-2005 11:14:39 MDT