How PredictHQ’s predicted viewership for Live TV Events works

Dr. Xuxu Wang
Chief Data Officer

By Dr Xuxu Wang and Tania Tian

PredictHQ’s new feature Live TV Events accurately predicts the viewership for major televised sports games across seven leagues so companies can forecast how these games will drive purchases such as food, beverage, delivered food, and retail decisions. It’s a radically new kind of demand intelligence.

Why does understanding the impact of predicting viewership matter?

Live broadcast sports events have significant impact on restaurants, retail, delivery, as well as food and beverage providers. Recently conducted research about the impact of sports on buying behaviors this year revealed:

  • 76% of sports fans consider ordering food as part of their NFL plans, with 31% ordering-in most games and 17% ordering-in every game.

  • Most families spend more than $30 on food and drinks to enjoy during a standard meal.

  • 89% of sports fans make purchasing decisions inspired by sport, such as apparel or entertaining (ie televisions or outdoor furniture), with 58% making several purchases per season.

As the stay-at-home economy becomes more important, tracking the exact impact of sports games on demand gives companies an advantage by ensuring they have the right amount of inventory and staff ready to meet demand without excessive wastage. 

Previously, pinpointing the impact of a particular game or even a cluster of games on your demand was challenging. Many of our customers, especially our large quick-serve retailers have been trying to grapple with this for years. The existing options for TV viewership are only available after the game and organized in designated marketing areas, making it not usable in demand forecasting models. Our data science team architected this from scratch, to provide accurate predicted viewership with county-by-county granularity to be used in demand forecasting, inventory management and staffing decisions.

How does PredictHQ know the predictions are accurate?

This is the million dollar question for every prediction, so we wanted to address this first thing. The image at the top of this post shows how our predictions stack up against post-game data from other providers.

We look at two main factors:

  1. At the aggregated national level, our predicted viewership for games that have already occurred are within a close margin of historical viewership by post-broadcast estimation models. As you can see from the graph above, our predictions are similar to external post-game sources such as NBC’s post game viewership data.

  2. We also track if our viewership figures correlated with the expected impact on demand for our key customers. 

Tracking both elements is critical to creating forecast-grade data. Accurate predictions that don’t relate to customer transactional data would indicate that we weren’t identifying relevant information for smarter demand forecasting, which is what our demand intelligence is all about.

At this stage, we have focused on predicting the viewership for live broadcasts of seven sports leagues:

  • NFL

  • NBA

  • MLB

  • NHL

  • MLS

  • NCAA Football

  • NCAA Men’s Basketball

The data also includes the top 100 sports games based on viewership that includes golf tournaments and boxing matches. We will continue to expand our sports coverage in the US and into Europe, as well as explore other key television events. While we can’t share all of our methodology publicly, we have covered some of the most asked questions below.

What factors impact the viewership of a game?

Our model uses more than 20 distinct features to pinpoint the expected viewership of a game. Throughout our testing with a handful of major customers, we were able to identify some significant impacts. These include:

  • Game day and time: Weekend evening games usually attract more viewers (and drive more fast food orders) than weekend afternoon or weeknight games. For example for one of our early access customers, an afternoon NFL game contributes ~5% increase in sales, whereas an evening game drives a ~13% increase.

  • Stage of the game: Finals will obviously be viewed more than regular games so our models  identify the impact of each type of game on viewership.

  • Team popularity: there is a wide range of fan base sizes. For example in 2020 according to WSN, the Jacksonville Jaguars have 1.8 million supporters while the Dallas Cowboys have 16 million. When two more popular teams are playing, the viewership increases, and our data has revealed so for fast food orders.

  • Team performance: we found that a team performing well can increase viewership by 8% to 19% throughout a season.

  • Uncertainty of match-up: two teams performing well will have substantially more viewers than a game where everyone assumes they already know the outcome. While sports fans instinctively understand this, discovering the exact impact and building it into the model’s intelligence was a key step.

  • Game location: In counties where a home team is playing, viewership is likely to surge by ~10% for an NFL game. This increase is particularly relevant for baseball, which frequently saw a 20+% surge when a home team was playing.

These are only some of the high impact features, which are focused only on the game-by-game level. But as any sports fan knows, sports games aren’t broadcast into a vacuum. Games airing at the same time or even on the same day or weekend will impact viewership considerably as well:

  • Competing postseason NFL broadcast impact on a MLB regular season broadcast (vs. no competing postseason NFL broadcast): 14% decrease in predicted viewership for the MLB game.

  • Competing postseason NBA broadcast (vs. no competing postseason NBA broadcast) for a MLB regular season: 4% decrease in predicted viewership on the MLB game.

How does PredictHQ predict TV viewership? 

Predicting TV viewership was a unique technical challenge. No other company does it, and it was further complicated by the imprecise nature of TV viewership measurement. Other players in the market use historical figures that are estimated based on a sample of viewership.

Originally we attempted to build our model based on previous external TV viewership data, but found that impossible for a number of reasons. This upended our original plan to build a series of supervised machine learning models. We instead built our models using a customized version of a probability framework our Chief Data Officer, Dr Xuxu Wang developed. This was a highly novel way of predicting viewership.

Our Live TV viewership prediction model is based on a unique probability framework. We leverage our entities and knowledge graph to determine viewership estimates by county. Here’s a quick overview of the model:

  • A non-parametric mixture probability model that estimates the weights of the discrete mixing distribution, using maximum likelihood estimations.

  • The probability of viewers is estimated given features such as team popularity, performance, game time and the competitiveness or uncertainty of the outcome.

  • The probability is applied to an estimated sports fan cap value of a live TV sports event for a given league at its specific season round stage, to derive the estimated number of viewers. 

  • Raw features are transformed using parametric and non-parametric regression approaches to capture non-linear relationships. For example, using a half normal distribution to describe the decay of a team performance from raw standings (see formula below) where x is the raw team standing at the league and season level.

alt

Whatever model we developed, we knew it would rely heavily on our entities system. PredictHQ doesn’t just track events such as concerts, expos and severe weather. We also track venues, sports teams, performers and much more. This is critical insight into an event’s predicted attendance. For example, the Spark Arena in Auckland, New Zealand hosts both Beyonce (who sells out the seats) as well as many local bands (which do not).

We have more than 55 million entities in our system and have been tracking many for years. One of our starting assumptions that proved accurate was higher performing teams draw larger audiences to the stands prior to the pandemic, and they would also draw more TV viewers. That’s just one of the elements our model factors in.

Overall, our data science team has found this to be a daunting and fascinating challenge. Seeing data scientists from our  customer base introduce Live TV Events data into their models and see substantial reductions in forecasting errors worth hundreds of millions of dollars has made it all the more satisfying.

If you would like to try the data out yourself or have more questions, get in touch with us here/contact. This data is featurized and can be introduced into your forecasting models as easily as any of our other event categories.