The wooden spoon of overcomplexity

So, we released the new LugRadio site and it ran really slowly. Now, it’s built on PyBlosxom, and it’s pretty complicated in parts; some pages include other pages by reading them off the filesystem, some include server-parsed versions of those pages; a CGI can’t output SSIs, so I had to do it myself with urllib; etc etc etc. It was dead slow. So I was trying to think of the best way to solve this; I thought about using PyBlosxom’s static rendering or running PyBlosxom under mod_python (no link provided as the HOWTO that’s out there is out of date and wrong; apparently there’s stuff in CVS which does it too, but I don’t want to run CVS and it relies on WSGI which I don’t understand either).
And then Matt mailed me and said “can we use caching to speed the site up?” And I thought: why am I trying to do this a really complex way? Since the site doesn’t do anything interactive (it does build pages dynamically, but the resultant HTML for that page is the same every time), why don’t I just spider the lot and serve the resultant saved static HTML? So that’s what I did, with the command

wget --no-host-directories \
--directory-prefix=/var/www/lugradio.org/static \
-p -E --mirror http://secretlocation.lugradio.org/

so the previously existing pyblosxom version now exists at http://secretlocation.lugradio.org/ and www.lugradio.org points at the statically saved HTML in /var/www/lugradio.org/static. And, because it’s all plain HTML files, it works like blazes. The wget command is cronned to run hourly. That’s it.

I’m a bit worried that I’m overlooking simple and good solutions in order to be more complex. That’s not only not a good thing, it’s exactly what I complain at other people about; overcomplicating a solution is not a good thing, and the KISS principle dictates not doing it. I need a slapping for not doing it the simple way. I’ll be installing complicated “frameworks” and so on next. If that happens, shoot me.

6 comments.

  1. Seems simple enough. You could use funky caching if you wanted. Just have the page generator set up as your 404 handler, and have it save the page to disk in the right location as it runs. The cache gets built up as people visit the various pages, and you can update the site by simply deleting all the files.

  2. Jim: I thought about that. It’s made a bit awkward by the URL rewriting and so on, but it’d be doable. Having the whole site fetched with wget was a lot quicker to implement, though; there are only about 100 pages in total on the site, so it’s not that big a deal.

  3. BTW, if you use Apache 2 your CGI scripts can output SSIs (using an ssi OutputFilter).

  4. Ian: I know, but then we have to move every site on the server to Apache 2, or we have to do all that mod_proxy stuff to have them both running, and both of those sound like they ave an amazing potential for breaking stuff. Life is too short. Moreover, the server runs Debian stable and Apache 2 isn’t in it :)

  5. Take a look at running squid in reverse proxy mode in front of your site. With a bit of tweaking of the caching headers coming from your app, squid will automatically handle all the caching for you. Squid is insanely fast—about three times faster at serving cached pages as apache is serving HTML. Highly recommended.

  6. Darren: to be honest, Apache serving HTML is more than fast enough. I imagine that squid wold work faster, but I just don’t really need the speed increase any more, so no extra work is justified :)