The amount of data Facebook collects from its nearly one billion users is astounding.
The highlights from Facebook is collecting your data — 500 terabytes a day — Data | GigaOM:
- 2.5 billion content items shared per day (status updates + wall posts + photos + videos + comments)
- 2.7 billion Likes per day
- 300 million photos uploaded per day
- 100+ petabytes of disk space in one of Facebook’s largest Hadoop (HDFS) clusters
- 105 terabytes of data scanned via Hive, Facebook’s Hadoop query language, every 30 minutes
- 70,000 queries executed on these databases per day
- 500+ terabytes of new data ingested into the databases every day
Facebook is a single location for 950 million users, and it’s a public company. So, these type of data can be tracked, providing perspectives that have never be achieved for the Internet at-large.
Interesting stuff and scary given that one company controls it all.
The scarier part is that in the not too distant future, someone will comment, “One of Facebook’s clusters ONLY held 100 petabytes of data?” Perhaps the commenter’s phone/communication device won’t have 100 petabytes of storage capacity, but that will happen some day.
Storage capacity is one of those mind-blowing topics. I remember how stoked I was to get a 1GB hard drive back in 1997. My TV probably rocks a bigger disk now, plus it’s solid state. Mind = blown.