Published on 10th August 2018

Web Scraping 11,678 FiveThirtyEight Articles

According to their website, “FiveThirtyEight is a data journalism organization” and “uses statistical analysis – hard numbers – to tell compelling stories.” I’ve always thought of the blog as reporting mostly on politics, but sometimes other stuff, like sports. Yet that may be biased, considering I’m a massive political junkie. This was made abundantly clear when I heard my more athletic friend describe it as “a sports blog, with some political stuff too.” So is FiveThirtyEight a politics blog with some sports on the side or a sports blog with some politics on the side?

To answer this question once and for all, I scraped every single article the website has published. In addition to scraping the text of every article, I also collected each article’s publication date and the section it’s listed in.

To date, FiveThirtyEight has published 5,906 politics articles and only 2,376 sports articles, making the former section look like the winner at first glance. That’s not incredibly fair, however, considering they didn’t start consistently writing about sports until 2014 – giving politics longer than a two year head start. Yet has FiveThirtyEight’s focus shifted since sports arrived at the scene?

The two sections have been neck-and-neck in terms of published content recently, although politics and sports have both enjoyed periods of website domination. ‘Culture’ also gets an honorable mention, although that section is inflated by the daily ‘Significant Digits’ column…more on that later.

Close readers may have noticed FiveThirtyEight’s political articles have been using less data visualization and statistical analysis than they once did. I’d be lying if I said I don’t begin every article by scrolling through to look for cool graphs – which has been increasingly leading to disappointment. FiveThirtyEight claims to use “hard numbers to tell compelling stories,” but just how many hard numbers do they use? To get a sense of this, I measured how many statistics were used on average each month (as a percentage of total words in the articles).

FiveThirtyEight began in 2008 purely as a forecasting blog, with Nate Silver using predictive models to predict the outcomes of various elections, so it should be no surprise this was an era of peak statistical reporting. We saw less ‘hard numbers’ after the blog partnered with The New York Times; this was potentially due to new staff members, professional editing, and attempts to more closely resemble the media giant’s reporting. Statistical reporting once again peaked when FiveThirtyEight was acquired by ESPN, however steeply declined over time.

Meanwhile, fans of the sports section may have noticed an opposite trend – FiveThirtyEight’s sports articles have gradually been incorporating more and more statistics. In fact, leading up to April 2018 – when ABC News acquired the blog – the sports section was consistently home to some of FiveThirtyEight’s heaviest data work being published. ABC News has made it abundantly clear that they were interested in FiveThirtyEight for their political reporting, and while they’ve reassured readers that “data-driven sports coverage” will continue, one has to wonder: is the data focus about to shift from sports to politics?

After running these analyses, I now have a lot more to say when someone asks me what FiveThirtyEight is. But wouldn’t it be great if I could add even more detail? What if I could replace the usual “they write about sports and politics” with a list of the top 20 topics they’ve written about? I’d get invited to even more social gatherings! So without further ado…

There you have it! From now on, when someone asks you what FiveThirtyEight is, you can finally answer with confidence: it’s a semi-data-journalism organization that writes somewhat equally about sports and politics and commonly reports on the 2016 election, the Trump Administration, basketball, football, baseball, health care, movies, Congress, and upcoming elections. Not bad, huh?


Check out the code on Github

2 Comments Add a Comment?


Ünver Çiftçi

Posted on Oct. 27, 2018, 5:40 a.m.

Hi, Oliver. How could you scrape the data, please. Thanks. Unver



Posted on Oct. 29, 2018, 8:24 p.m.

Hey Unver! Here's the code I used to scrape the data:

Let me know if you still have questions after checking it out.

