So, we released
the new LugRadio site and it ran really slowly. Now, it’s built on PyBlosxom, and it’s pretty complicated in parts; some pages include other pages by reading them off the filesystem, some include server-parsed versions of those pages; a
CGI can’t output SSIs, so I had to do it myself with urllib; etc etc etc. It was dead slow. So I was trying to think of the best way to solve this; I thought about using
PyBlosxom’s static rendering or running PyBlosxom under mod_python (no link provided as the
HOWTO that’s out there is out of date and wrong; apparently there’s stuff in
CVS which does it too, but I don’t want to run
CVS and it relies on
WSGI which I don’t understand either).
And then
Matt mailed me and said “can we use caching to speed the site up?” And I thought: why am I trying to do this a really complex way? Since the site doesn’t do anything
interactive (it does build pages dynamically, but the resultant
HTML for that page is the same every time), why don’t I just spider the lot and serve the resultant saved static
HTML? So that’s what I did, with the command
wget --no-host-directories \
--directory-prefix=/var/www/lugradio.org/static \
-p -E --mirror http://secretlocation.lugradio.org/
so the previously existing pyblosxom version now exists at http://secretlocation.lugradio.org/ and www.lugradio.org points at the statically saved
HTML in /var/www/lugradio.org/static. And, because it’s all plain
HTML files, it works like blazes. The wget command is cronned to run hourly. That’s it.
I’m a bit worried that I’m overlooking simple and good solutions in order to be more complex. That’s not only not a good thing, it’s exactly what I complain at other people about; overcomplicating a solution is not a good thing, and the KISS principle dictates not doing it. I need a slapping for not doing it the simple way. I’ll be installing complicated “frameworks” and so on next. If that happens, shoot me.
Seems simple enough. You could use funky caching if you wanted. Just have the page generator set up as your 404 handler, and have it save the page to disk in the right location as it runs. The cache gets built up as people visit the various pages, and you can update the site by simply deleting all the files.