Nov. 28, 2017 by Denis Vilar
How AI Platforms Will Disrupt The Digital Media Industry
This article was first published in French national newspaper La Tribune.
Over the past decade, we’ve seen a massive adoption of Data Management Platforms, which are in short databases where publishers centralize their user data, be it first-party or third-party. The DMP was a first step in the data revolution, but it’s main purpose was limited to advertising: aggregate user cookies, identify segments and improve ad targeting.
Now, with the surge of Artificial Intelligence and the plethora of new applications it creates, this trend far exceeds advertising, and the next big thing is likely to be Artificial Intelligence Platforms. AI indeed opens a wide array of opportunities that cover the entire content publishing-chain:
Showing the right content to the right person at the right time. This is not new, it has been around since 2005 (we called it “Web 2.0”, remember?), but the recent advances in AI makes recommender systems an order of magnitude more efficient.
We now see more and more AI-supported tools that help you produce content in a (semi-)automated way. The quality and brand image of such content may today be questionable, but these tools will only improve over time, and within the next few years the quality will likely be on a par with professional production.
Predict what content to produce based on past data. This is what Netflix does to decide which show to produce, or how Buzzfeed determines which article to write. AI and data science help you steer your content production strategy in many ways: What topics, verticals should you invest in? What’s the optimal video duration? What format works best? Which segments of your audience are you missing?
showing the right ad to the right person. Again, not new, it has been the focus of the ad-tech industry for the past decade, but the more data you include and the more sophisticated the models are, the better your results become.
Audit / Safety
Spam, fraud, bot and brand safety detection systems all rely on AI, and actually benefit from progress medical research since you can often apply the same algorithms that are used to detect diseases.
But here’s the catch: AI really shines when it has massive amounts of data to play with, from as many sources as possible:
• Content: metadata, tags, categories, etc.
• Visitors: demographics, interests, etc.
• Consumption: page views, video views, time spent, etc.
• Context: device, location, time of day, etc.
• External data: current live events, hot topics on social media, weather conditions, etc.
As leading data scientist Andrew Ng explains, an AI system can be seen as a rocket ship: in order to land on the moon you need an extremely powerful rocket engine and a massive amount of fuel. Computers provide the engine, their performance has improved significantly over the past decade, but today they are becoming a commodity: with cloud providers anyone can set up hundreds of powerful AI-ready servers in a few clicks.
Data is the fuel that lets your AI system really take off, and today it has become the real asset. Companies strive to accumulate as much data as possible and protect it by all means. GAFAs understood it very soon and have become real data fortresses. This is what makes them so powerful.
The problem is that, at traditional media companies, the applications and data streams mentioned earlier are managed by different teams using tools that don’t communicate with each other: content strategy is handled by publishing directors, content production by editors and video makers, personalization by the product team, and ad targeting by ad ops. These are closed systems, no-one has the full picture.
On the contrary, when aggregating data points from all sources, AI is able to identify and leverage very subtle correlations. Let’s take the example of a multi-layer recommendation engine:
On this example you have the input data on the left, in the middle the layers that may have been activated, and on the right the output recommendation. By combining “mobile” and “Wednesday 8 am EST”, the system may infer that the user is commuting, and in turn infer that they are more likely to be interested in short-form content. By combining “Manhattan”, “Female” and the user history, it can infer the interest in “Fashion”, which combined with the Twitter trending topic #FashionWeekParis becomes an even stronger signal. In short, the system will combine millions of combinations and sub-combinations of all data points to finally compute its best output, the content with the highest probability to be watched.
As you can see, such a sophisticated system is only possible if all teams can share their data in a centralized database, which is responsible to ingest, clean, and store data, then leverage AI to refine it and run domain-specific algorithms to solve specific business needs. The side effect that few people realize is that 80% of the efforts will be spent building the data pipeline (aggregating, formatting, cleaning diverse data, scaling the system), and only 20% designing and running the actual AI models.
Publishers and vendors that apply this convergence strategy can unlock new AI-based solutions to their specific problems an order of magnitude more performant than existing ones, and this way achieve skyrocketing growth.
Pulpix Inc. is an AI-Powered Video Platform backed by Y-Combinator, analyzing billions of data points on video consumption and helping publishers driving engagement and revenue.