Twitter, as you probably know, is the huge micro-blogging service that let users connect with small, 140 character tweets. Recently I’ve seen some stats putting the amount of tweets–Twitter messages–at about 400 million a day. This is an immense amount of information, but imagine what kind of system that Twitter must be running to hold up to all this data.
Lets do some math here. If I’m not mistaken, Twitter probably uses something like UTF-8 encoding as well as some sort of database with a VARCHAR field in some sort of database like MySQL. Now, calculating the exact size is going to be tricky, as there is a huge amount of difference, but I’m just going to say that each tweet takes up 561 bytes. This is 4 multiplied by 140 (assuming each Tweet is 140 characters) plus 1 for the length of the characters in the database (under 255 means 1 byte to store the length if I’m not mistaken).
Now, 561 bytes may sound extremely small…
Now, 561 bytes may sound extremely small… well, it is. Unless you remember a day when 2 KB of RAM was reserved for data centers, you probably have something like a 1 TB HDD these days. Who am I kidding, your flash drive probably has at least 16 GB of flash memory. If it has less, consider upgrading–wait, you still use flash drives?!
However, when we take this humble 561 bytes, and multiply it by a factor of 400 million… we are left with a product of 224 billion. That may sound like an immense amount, and while it is, a byte still doesn’t take up that much space on computers today. To put this into perspective, we are talking about 224.4 GB. So maybe you have enough room on your computer for this hypothetical day of tweets.
To put this into perspective, we are talking about 224.4 GB.
The real question… would you run out of space after a week? A month? A year? Lets run through our numbers a bit more. 222.4 GB in a day, so that increased by a factor of 7 yields us 1.5708 TB of data. I have quite a bit of storage space in my computer, but I still don’t think I could realistically hold this much on my computer. 1.5 TB of storage needs about 2 TB of HDD space (less, but you don’t see many 1.7 TB HDDs).
So in a week, you could store all these tweets if you spend roughly 150 dollars on a solid 2 TB hard drive. A month though, would leave you roughly 600 dollars less richer as you accommodate roughly 6 TB of tweets inside your humble computer. 6 TB would give you roughly enough space to store up to 12 years of uninterrupted MP3 music.
6 TB would give you roughly enough space to store up to 12 years of uninterrupted MP3 music.
Insane, right? Then you get into a full year of this, and you realize how much data these little 140 tweets could potentially take up. 72 TB of tweets is kind of mind boggling, since we are just counting the rough space of the tweets–not the actual site, images, user profile data, etc. Don’t fret though, as I’m sure that tweets don’t take up that much space. 72 TB is roughly 4 tenths of the surface web (visible portion). Through the magic of data compression and considering that no where near every tweet is using UTF-8 encoding to its max and exactly 140 characters.
Interestingly enough, if you changed it from 561 bytes per tweet to 560 bytes per tweet, it appears to save 3 TB of data per year. So you can see just the kind of fragility you are working with when it comes to 140 characters. Think about that when you are frustrated with Twitter’s 140 character limit next time.