Recently, I discovered that Facebook allows you to download your data they have saved on you, including every single message you’ve ever sent or received. So naturally, I downloaded my messages and got to work. The data go as far back as March 16th, 2010 (at 9:55pm, to be exact), the day I sent my first Facebook message.
What’s in the data?
Even though I started using Facebook messenger in 2010, my activity didn’t really pick up until a few years in.
Tracking the number of messages I’ve sent each month is pretty cool, but only scratches the surface of what’s possible with these data. I can also take a look at my messaging rates with any individual, assuming they’ve sent me at least one message. Here’s an example from my good friend Alina, who I met in August 2016:
Life is full of change; relationships evolve constantly, people come and go, and best friends often turn into ‘old friends’ as we move from one life chapter to another. This became abundantly clear to me when I looked up the ten friends who have sent me the most messages on Facebook and plotted each individual’s messaging patterns together.
The two individuals taking the #1 and #2 spots are outliers by any definition; they’ve respectively sent me 27,593 and 22,904 messages – both outranking the next 8 friends combined (friends ranked #3-8 have sent me a total of 22,424 messages). Because my top two friends dominated the field, let’s omit them to get a better look at the rest of the bunch.
Here we can much more clearly see the messaging patterns of the other eight friends. So what all is going on here?
When learning that Alina has sent me the most messages in January of 2017, she openly wondered what we had been talking about. A great question, and thankfully, one I can answer!
To determine the stand-out topics of a month’s worth of messages, I used tf-idf (which stands for term frequency-inverse document frequency, but we’ll stick with the abbreviation). For each month, I grouped all of Alina’s messages into one long text document and looked for words that appeared often. But here’s the kicker – tf-idf compares to other text documents in the collection, so if a word was used often in one month but also used commonly in other months, it wouldn’t stand out as a relevant conversation topic. Therefore, for a word to score highly on the tf-idf measure, it needs to have been used a lot in one month and not often in other months.
Calculating tf-idf on myself is particularly interesting because it includes every message I sent, regardless of who the recipient was. This therefore gives us a decent idea of what was going on in my life at the time or what I found to be important. Let’s take a look at what I was up to in the most recent full year, 2017.
This seems like an odd assortment at first glance, but every seemingly random word has a solid explanation…for the most part.
In February, much of my free time was devoted to my work in the student government – otherwise known as ‘DCGA’ – and on promoting sexual consent.
April was a busy month! I hosted a trivia game, ate a lot of tacos, and advertised a very fun race to my friends.
In May I moved to LA for a summer internship and spent all of my hard-earned money on Uber.
Throughout June and July, I used Facebook messenger to collect data for an article about social networks. Because I was sending the same, long script to nearly 500 students, my messenger usage skyrocketed during those months.
In August, I was frustrated by a major plot hole in the season 1 finale of The Flash (SPOILERS BELOW):
Throughout September, my senior seminar class hosted a public deliberation. I used Facebook to advertise the event and query for potential discussion topics.
And in October, I did some research on the teletubbies and shared my findings with anyone who would listen. No joke.
Rather than plotting monthly counts, I can keep a running tally of total messages, which would be helpful in creating a cumulative frequency plot. This is helpful not only in showing how many messages have been sent in total by a certain month, but paying attention to the derivative of the plotted line can also reveal the rate at which the message total is growing; steep lines indicate more messaging than average, narrow lines indicate less. For example, look at how the rate of messaging from a close friend of mine increased once I moved back to the same city as him and declined after we graduated high school and left for college.
Now that you’ve seen that example, check out how my cumulative message count has grown over time. Since the big bump in Facebook usage in 2012, it appears to have grown fairly consistently.