How I set up a Twitter archive with Tweetback

Twitter currently has problems. Well, one specific problem, which is the bloke who bought it. My solution to this problem has been to move to Mastodon (@sil@mastodon.social if you want to do the same), but I’ve invested fifteen years of my life providing twitter.com with free content so I don’t really want it to go away. Since there’s a chance that the whole site might vanish, or that it continues on its current journey until I don’t even want my name associated with it any more, it makes sense to have a backup. And obviously, I don’t want all that lovely writing to disappear from the web (how would you all cope without me complaining about some random pub’s music in 2011?!), so I wanted to have that backup published somewhere I control… by which I mean my own website.

So, it would be nice to be able to download a list of all my tweets, and then turn that into some sort of website so it’s all still available and published by me.

Fortunately, Zach Leatherman came to save us by building a tool, Tweetback, which does a lot of the heavy lifting on this. Nice one, that man. Here I’ll describe how I used Tweetback to set up my own personal Twitter archive. This is unavoidably a bit of a developer-ish process, involving the Terminal and so on; if you’re not at least a little comfortable with doing that, this might not be for you.

Step 1: get a backup from Twitter

This part is mandatory. Twitter graciously permit you to download a big list of all the tweets you’ve given them over the years, and you’ll need it for this. As they describe in their help page, go to your Twitter account settings and choose Your account > Download an archive of your data. You’ll have to confirm your identity and then say Request data. They then go away and start constructing an archive of all your Twitter stuff. This can take a couple of days; they send you an email when it’s done, and you can follow the link in that email to download a zip file. This is your Twitter backup; it contains all your tweets (and some other stuff). Stash it somewhere; you’ll need a file from it shortly.

Step 2: get the Tweetback code

You’ll need both node.js and git installed to do this. If you don’t have node.js, go to nodejs.org and follow their instructions for how to download and install it for your computer. (This process can be fiddly; sorry about that. I suspect that most people reading this will already have node installed, but if you don’t, hopefully you can manage it.) You’ll also need git installed: Github have some instructions on how to install git or Github Desktop, which should explain how to do this stuff if you don’t already have it set up.

Now, you need to clone the Tweetback repository from Github. On the command line, this looks like git clone https://github.com/tweetback/tweetback.git; if you’re using Github Desktop, follow their instructions to clone a repository. This should give you the Tweetback code, in a folder on your computer. Make a note of where that folder is.

Step 3: install the Tweetback code

Open a Terminal on your machine and cd into the Tweetback folder, wherever you put it. Now, run npm install to install all of Tweetback’s dependencies. Since you have node.js installed from above, this ought to just work. If it doesn’t… you get to debug a bit. Sorry about that. This should end up looking something like this:

$ npm install
npm WARN deprecated @npmcli/move-file@1.1.2: This functionality has been moved to @npmcli/fs

added 347 packages, and audited 348 packages in 30s

52 packages are looking for funding
  run `npm fund` for details

found 0 vulnerabilities

Step 4: configure Tweetback with your tweet archive

From here, you’re following Tweetback’s own README instructions: they’re online at https://github.com/tweetback/tweetback#usage and also are in the README file in your current directory.

Open up the zip file you downloaded from Twitter, and get the data/tweets.js file from it. Put that in the database folder in your Tweetback folder, then edit that file to change window.YTD.tweet.part0 on the first line to module.exports, as the README says. This means that your database/tweets.js file will now have the first couple of lines look like this:

module.exports = [
  {
    "tweet" : {

Now, run npm run import. This will go through your tweets.js file and load it all into a database, so it can be more easily read later on. You only need to do this step once. This will print a bunch of lines that look like { existingRecordsFound: 0, missingTweets: 122 }, and then a bunch of lines that look like Finished count { count: 116 }, and then it’ll finish. This should be relatively quick, but if you’ve got a lot of tweets (I have 68,000!) then it might take a little while. Get yourself a cup of tea and a couple of biscuits and it’ll be done when you’ve poured it.

Step 5: Configure a subdirectory (optional)

If you’re setting up your own (sub)domain for your twitter archive, so it’ll be at the root of the website (so, https://twitter.example.com or whatever) then you can skip this step. However, if you’re going to put your archive in its own directory, so it’s not at the root (which I did, for example, at kryogenix.org/twitter), then you need to tell your setup about that.

To do this, edit the file eleventy.config.js, and at the end, before the closing }, add a new line, so the end of the file looks like this:

    eleventyConfig.addPlugin(EleventyHtmlBasePlugin);
    return {pathPrefix: "/twitter/"}
};

The string "/twitter/" should be whatever you want the path to the root of your Twitter archive to be, so if you’re going to put it at mywebsite.example.com/my-twitter-archive, set pathPrefix to be "/my-twitter-archive". This is only a path, not a full URL; you do not need to fill in the domain where you’ll be hosting this here.

Step 6: add metadata

As the Tweetback README describes, edit the file _data/metadata.js. You’ll want to change three values in here: username, homeLabel, and homeURL.

username is your Twitter username. Mine is sil: yours isn’t. Don’t include the @ at the beginning.

homeLabel is the thing that appears in the top corner of your Twitter archive once generated; it will be a link to your own homepage. (Note: not the homepage of this Twitter archive! This will be the text of a link which takes you out of the Twitter archive and to your own home.)

homeURL is the full URL to your homepage. (This is “https://kryogenix.org/” for me, for example. It is the URL that homeLabel links to.)

Step 7: (drum roll, please!) Build the site

OK. Now you’ve done all the setup. This step actually takes all of that and builds a website from all your tweets.

Run npm run build.

If you’ve got a lot of tweets, this can take a long time. It took me a couple of hours, I think, the first time I ran it. Subsequent runs take a lot less time (a couple of minutes for me, maybe even shorter for you if you’re less mouthy on Twitter), but the first run takes ages because it has to fetch all the images for all the tweets you’ve ever written. You’ll want a second cup of tea here, and perhaps dinner.

It should look something like this:

$ npm run build

> tweetback@1.0.0 build
> npx @11ty/eleventy --quiet

[11ty] Copied 1868 files / Wrote 68158 files in 248.58 seconds (3.6ms each, v2.0.0-canary.18)

You may get errors in here about being unable to fetch URLs (Image request error Bad response for https://pbs.twimg.com/media/C1VJJUVXEAE3VGE.jpg (404): Not Found and the like); this is because some Tweets link to images that aren’t there any more. There’s not a lot you can do about this, but it doesn’t stop the rest of the site building.

Once this is all done, you should have a directory called _site, which is a website containing your Twitter archive! Hooray! Now you publish that directory, however you choose: copy it up to your website, push it to github pages or Netlify or whatever. You only need the contents of the _site directory; that’s your whole Twitter archive website, completely self-contained; all the other stuff is only used for generating the archive website, not for running it once it’s generated.

Step 8: updating the site with newer tweets (optional)

If you’re still using Twitter, you may post more Tweets after your downloadable archive was generated. If so, it’d be nice to update the archive with the contents of those tweets without having to request a full archive from Twitter and wait two days. Fortunately, this is possible. Unfortunately, you gotta do some hoop-jumping to get it.

You see, to do this, you need access to the Twitter API. In the old days, people built websites with an API because they wanted to encourage others to interact with that website programmatically as well as in a browser: you built an ecosystem, right? But Twitter are not like that; they don’t really want you to interact with their stuff unless they like what you’re doing. So you have to apply for permission to be a Twitter developer in order to use the API.

To do this, as the Tweetback readme says, you will need a Twitter bearer token. To get one of those, you need to be a Twitter developer, and to be that, you have to fill in a bunch of forms and ask for permission and be manually reviewed. Twitter’s documentation explains about bearer tokens, and explains that you need to sign up for a Twitter developer account to get them. Go ahead and do that. This is an annoying process where they ask a bunch of questions about what you plan to do with the Twitter API, and then you wait until someone manually reviews your answers and decides whether to grant you access or not, and possibly makes you clarify your answers to questions. I have no good suggestions here; go through the process and wait. Sorry.

Once you are a Twitter developer, create an app, and then get its bearer token. You only get this once, so be sure to make a note of it. In a clear allusion to the delight that this whole process brings to users, it probably will begin by screaming AAAAAAAAAAAAAAA and then look like a bunch of incomprehensible gibberish.

Now to pull in new data, run:

TWITTER_BEARER_TOKEN=AAAAAAAAAAAAAAAAAAq3874nh93q npm run fetch-new-data

(substituting in the value of your token, of course, which will be longer.)

This will fetch any tweets that aren’t in the database because you made them since! And then run npm run build again to rebuild the _site directory, and re-publish it all.

I personally run these steps (fetch-new-data, then build, then publish) daily in a cron job, which runs a script with contents (approximately):

#!/bin/bash
cd "$(dirname "$0")"

echo Begin publish at $(date)

echo Updating Twitter archive
echo ========================
TWITTER_BEARER_TOKEN=AAAAAAAAAAAAAA9mh8j9808jhey9w34cvj3g3 npm run fetch-new-data 2>&1

echo Updating site from archive
echo ==========================
npm run build 2>&1

echo Publishing site
echo ===============
rsync -e "ssh" -az _site/ kryogenix.org:public_html/twitter 2>&1

echo Finish publish at $(date)

but how you publish and rebuild, and how often you do that, is of course up to you.

Step 9: improve the archive (optional, but good)

What Tweetback actually does is use your twitter backup to build an 11ty static website. (This is not all that surprising, since 11ty is also Zach’s static site builder.) This means that if you’re into 11ty you could make the archive better and more comprehensive by adding stuff. There are already some neat graphs of most popular tweets, most recent tweets, the emoji you use a lot (sigh) and so on; if you find things that you wish that your Twitter archive contained, file an issue with Tweetback, or better still write the change and submit it back so everyone gets it!

Step 10: add yourself to the list of people using the archive (optional, but you know you wanna)

Go to tweetback/tweetback-canonical and add yourself to the mapping.js file. What’s neat about this is that that file is used by tweetback itself. This means that if someone else with a Tweetback archive has a tweet which links to one of your Tweets, now their archive will link to your archive directly instead! It’s not just a bunch of separate sites, it’s a bunch of sites all of which are connected! Lots of connections between sites without any central authority! We could call this a collection of connections. Or a pile of connections. Or… a web!

That’s a good idea. Someone should do something with that concept.

Step 11: big hugs for Zach

You may, or may not, want to get off Twitter. Maybe you’re looking to get as far away as possible; maybe you just don’t want to lose the years of investment you’ve put in. But it’s never a bad thing to have your data under your control when you can. Tweetback helps make that happen. Cheers to Zach and the other contributors for creating it, so the rest of us didn’t have to. Tell them thank you.

I'm currently available for hire, to help you plan, architect, and build new systems, and for technical writing and articles. You can take a look at some projects I've worked on and some of my writing. If you'd like to talk about your upcoming project, do get in touch.

More in the discussion (powered by webmentions)

  • Gamer Geek responded at twitter.com Stuart Langridge: How I set up a Twitter archive with Tweetback kryogenix.org/days/2022/12/1… Twitter currently has problems. Well, one specific probl…