Downloading Twitter Archive

 
Written By Sanjir Habib On May-27th, 2021
Twitter has a "Download all your tweets" link. After submitting, you need to wait for a day, and then the archive will be available for download for 1 week since. But, here's the catch.

1 GB Archive?

Might come as a surprise when tweets are limited to 240 chars only. One might think the archive should not be less than a mega byte and then get hits with this.

Can't download easily

- No resume support during download as the zip is possibly delivered over a script. - Worse, it will send the file for a minute or two and cut your download pre-maturely. Good luck if you don't have a gigabit link for downloading it fast enough. - Also note "no resume". Once broken, you have to start from beginning, and then get cut again.

Downloading from a hosted server

As those have pretty fat pipeline to download it in minutes. But... you will face with authentication error as you possibly aren't using a javascript enabled real web browser anyway.

Downloading the archive using curl

First get the cookie files from your desktop browser. Install this Chrome extension on your browser that will help you fetch your cookies. Next get to twitter. Authenticate everything. And start downloading the zip archive to get cut. Let it cut. But now you have the cookies. Click on the extension's icon on your browser and save your cookies to your disk. Upload the cookiesjar.txt file over to your web server. Copy the URL of the failed downlaod from Chrome's download page. And ssh into your server. Run curl to fetch your zip archive from twitter.
curl -b twitter.com_cookies.txt -c twitter.com_cookies.txt 'https://ton.twitter.com/i/ton/data/archives/6814845/twitter-2021-05-23-mylongid.zip' -L > twitterarchive.zip
The -b and -c switches enables curl to use the cookies files. You can put pv command in between the redirect to see progress.

What does the Giga Byte of Archive Contain?

Every video you retweeted. Which makes the archive this much larger in size.