On the LugRadio site we publish an RSS feed of episodes, and each episode includes an enclosure tag which references the downloadable MP3 for that episode. So, one snippet from the feed looks like:
<item> <title>Bars with coconut in</title> <link>http://www.lugradio.org/episodes/31</link> <description>Jono Bacon, Stuart Langridge (Aq), Matt Revell, and <span title="On Call Bald" style="border-bottom: 1px dotted #ccc">Ade Bradshaw</span> talk about Linux and whatever else comes along, including: </p> <ul> <li>An interview with Yannick and Carlos from Nokia about the Nokia Internet Tablet and the company's approach to open-source software</li> <li>Bounties for writing code: are they a good idea?</li> <li>Ian Brown, head of FIPR, on ID cards in the UK, and whether they should happen or not</li> <li>Samba: should we be inventing our own open protocols rather than chasing the tail-lights of closed competitors?</li> </ul> </description> <enclosure url="http://lug.mtu.edu.nyud.net:8090/lugradio/lugradio-s02e19-040705-high.mp3?podcast" type="audio/mpeg" length="17659904" /> </item>
The problem is this: if the enclosure URL changes, people’s podcast clients and RSS aggregators download stuff again. How can I avoid this happening?
A couple of suggested wrong solutions:
1. Never change the URL
Can’t do that. The URL points to a mirrored copy of the episode’s mp3 file. If that mirror goes offline (our mirrors are run by volunteers without payment), we have to change the URL to point to another.
2. Use a redirect
Lots of people say “just make the URL be http://www.lugradio.org/mirrors/season2/episode19/mp3” and have that be a CGI script that redirects to a mirror. That would be great if podcast clients were compliant HTTP clients, and followed redirects. In practice, they are not and do not. This means that, if we implemented the redirect anyway, we’d be secure in our integrity but lots of people couldn’t download the show the way they want. Knowing that we are right and they are wrong is cold comfort when we’re annoying our listeners; we can’t use redirects.
3. Coralize the podcast enclosure URLs
Use the Coral distribution network to not put too much pressure on the archive that goes into the mirror feed. We’re already doing this, but if that archive goes away, the Coralized URL won’t point to anything, and we’ll have to change the URL. The Coral people are very cool, but they won’t cache all our mp3 files indefinitely.
4. Set up an archive that never goes away
Essentially, this is a suggestion that goes with suggestion 1: make sure that old mirror URLs don’t break by setting up the One Canonical Archive that the podcast feed points at. The issue with doing that is that that one archive gets hit pretty hard for bandwidth, because all the podcast readers use it. This means that it has to be an archive with lots of bandwidth to cover the initial download spike when an episode is released. We could use archive.org, and we do upload episodes there, but their upload process is long and laborious and would delay an episode’s release by quite some time, which we’d rather not do.
5. Do a ‘redirect’ by streaming the data through our URL
No-one’s suggested this, but I’ve thought of it. We could put the URL from the “redirect” suggestion above in the feed, but instead of having that URL redirect to a mirror, that URL points to a CGI script which downloads the data from a mirror and streams it out on the fly to the consumer. I don’t want to do this because it puts a horrific bandwidth requirement on the lugradio.org machine; every downloaded byte of a LugRadio episode will go through that machine, and we can’t afford the bandwidth for that.
At the moment, we Coralize URLs, and we’re trying to set up a Canonical Archive based on a very useful donation (about which more in a few days). But is there a better way? I can’t help but think that there should be a better way around this; something with tag URIs or guids or something. Remember that we need something which works with the podcast clients that currently exist; I don’t want to hear “clients should support XXXX and if they don’t then you should ignore them“, because we don’t have the luxury of doing that, sadly.
Any help will be greatly, greatly appreciated; I’ve been wrestling with this problem for months now and I’m still not sure how to solve it. Thanks!