This weekend I’ve been playing with Sofa, jchris‘s CouchDB blog
software. I like CouchDB, and I’m increasingly starting to thing that
although Wordpress is pretty good, it’s the wrong path; it’s serious
evolution running on a dead-end evolutionary path. Like an
electric-powered go-faster-striped penny farthing. So a move away is
something worth investigating. Sofa is pretty feature-poor by comparison
with Wordpress, but I actually don’t care about almost all of the
Wordpress features anyway. Sofa’s not too hard to set up, but then I
work with CouchDB all the time anyway. It’d likely be harder for others.
Importing posts from Wordpress took the longest to do, because I had to
write a script to do it, and overcome a Wordpress problem. When
exporting posts from Wordpress (under Tools > Export) the downloaded
export XML file ended up being truncated because the export script took
too long to run (I’ve been posting here for seven years; it’s built up.)
So I added php_value max_execution_time 600
to my Wordpress
.htaccess
file and that made it work. After that, and after installing
Sofa to CouchDB with CouchApp (as per Sofa instructions), a script:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
|
This script is totally, totally hardcoded to do the things that I want it to do, so you can’t just run it, you’ll need to change it. It also doesn’t recover from errors much, either. Anyway, to run the script:
curl -X DELETE http://sil:couchdb@localhost:5984/blogdb # delete existing DB
couchapp push . http://sil:couchdb@127.0.0.1:5984/blogdb # install Sofa
python wordpress-import.py wordpress.2009-09-12.xml # import all Wordpress posts
Then, of course, there’s a problem. If you look at jchris’s blog,
the URLs are all horrible: the index page is
http://jchrisa.net/drl/_design/sofa/_list/index/recent-posts?descending=true&limit=5
,
and an individual post is
http://jchrisa.net/drl/_design/sofa/_show/post/Book-progress
. I want
my nice Wordpress URLs - /
for the index, /2009/09/13/slug-name
for
a post. This is, of course, doable by putting CouchDB behind some sort
of proxy. Some people would use nginx for this; since I have Apache
running on our server anyway, I went with mod_rewrite and mod_proxy:
RewriteEngine on
# Front page
RewriteRule ^$ http://127.0.0.1:5984/blogdb/_design/sofa/_list/index/recent-posts?descending=true&limit=5 [P]
# assets
RewriteRule assets/script/(.*) http://127.0.0.1:5984/_utils/script/$1 [P]
RewriteRule assets/(.*) http://127.0.0.1:5984/blogdb/_design/sofa/$1 [P]
# feed
RewriteRule feed/atom /blogdb/_design/sofa/_list/index/recent-posts?descending=true&limit=5&format=atom [P]
# /2009
RewriteRule ^([0-9][0-9][0-9][0-9])/?$ http://127.0.0.1:5984/blogdb/_design/sofa/_list/index/recent-posts?descending=true&limit=500&startkey="$1/12/32"&endkey="$1/01/00" [P]
# /2009/08
RewriteRule ^([0-9][0-9][0-9][0-9])/([0-9][0-9])/?$ http://127.0.0.1:5984/blogdb/_design/sofa/_list/index/recent-posts?descending=true&limit=500&startkey="$1/$2/32"&endkey="$1/$2/00" [P]
# /2009/08/12
RewriteRule ^([0-9][0-9][0-9][0-9])/([0-9][0-9])/([0-9][0-9])/?$ http://127.0.0.1:5984/blogdb/_design/sofa/_list/index/recent-posts?descending=true&limit=500&startkey="$1/$2/$3 24"&endkey="$1/$2/$3 00" [P]
# /2009/08/12/post.html
RewriteRule ^([0-9][0-9][0-9][0-9])/([0-9][0-9])/([0-9][0-9])/post.html$ http://127.0.0.1:5984/blogdb/_design/sofa/_show/post/post.html
# /2009/08/12/slug
RewriteRule ^([0-9][0-9][0-9][0-9])/([0-9][0-9])/([0-9][0-9])/?(.*)$ http://127.0.0.1:5984/blogdb/_design/sofa/_show/post/$1-$2-$3-$4 [P]
All well and good. But…then we hit the problem, and it’s not a very
resolveable problem. You see, Sofa writes out HTML, and it writes out
links and so on in the nasty Sofa format. So, you can poke Sofa’s JS
files to write them out differently, which I’ve done (edit indexPath
,
feedPath
, and post.link
in sofa/lists/index.js
), but changing
things like the URL that comments are fetched from is harder. You see,
the Couch JavaScript files assume they are running on Couch, and can
therefore do things like inspect the current URL to work out which
database they’re in; using one of jchris’s URLs, for example, you see
http://jchrisa.net/drl/_design/sofa/_show/post/Book-progress
, with the
database name, design document name, and post ID highlighted. If you’ve
rewritten your URL to /2009/09/13/book-progress, that doesn’t work.
Fixing this is awkward, because view URLs are calculated by
jquery.couch.js
, which is a stock part of Couch, and so a fix would
involve forking rather a lot of Sofa. Apparently there’s work going on
by the Couch team to do native URL rewrites inside Couch itself. My
migration to Sofa will likely have to wait until then, unless I write my
own CouchDB blogging engine, which is more than a weekend’s job I fear.
Shame. At least now I don’t have to worry about how I get CouchDB 0.10
running on ancient Debian…