2 dimensional PCA. The axes are abstract combinations of all the "features" (i.e. data points) in the system, computed to visualize a maximum amount of information.
Our system aggregates as many data points as possible in order to gain deep insights on the content and on the audience.
The system is designed to update itself every hour to update the algorithm weights.
Our technology works on all scenarios and edge- cases: cold-start, large catalog, small audiences, missing metadata, etc.
Get access to your stats in real-time and gain insight into video views, time spent, revenue, ad impressions, locations, devices and more.
Customize the look, frequency and type of recommendations according to your needs.
Make sure you reach your goals by highlighting your branded videos. Our smart tool allows you to set business and editorial rules.
The data collection and processing is done in real-time so that the recommendation model always takes the most recent data.
We collect events related to user behavior on your website:
- Which video the user watched
- Behavior in real-time during video playback: scrolls, seeks, video visibility, and video completion rate
- Clicks on recommendations and widgets
- Ad play and completion
Each event is linked to anonymous metadata about the user, such as browser fingerprint, user agent and IP. We use this to uniquely identify a given user. We can then compute each user’s detailed watch history.
This data is then used by our algorithms to understand users’ tastes and consumption patterns.
We store the data on Amazon servers in the U.S. Our servers are replicated Our servers are replicated and backed up to prevent data loss and outages.
Our script handles with all video players (VideoJS, JW Player, Ooyala, Brightcove, Youtube, Dailymotion, Kaltura, ThePlatform, etc).
Every time a user watches a video, our API checks whether that video is already indexed. If it is not, our backend fetches and processes it, in order to extract relevant information from metadata and from the video itself and indexes it in our databases.
If you have content RSS/XML feeds or APIs with more metadata we can also use those for better results.
We don't use any demographic information about users, only their navigation patterns. The first time we see a user, we generate a fingerprint to uniquely identify them across their sessions and activity on your sites. This fingerprint is based on a mix of IP, browser agent, cookies, etc. and is hashed. If you have stricter guidelines we could also rely on your own user identifier. For content-based recommendations, they are completely user-agnostic as they only rely on the content.
Yes, this would be helpful, as it would provide better identification of users across sessions and across devices, leading to a better knowledge of their behavior and interests. Please reach out to discuss this integration, we will offer a solution depending on your technical stack.
We use cutting-edge Artificial Intelligence algorithms that generate many abstract features for each video. Features can be seen as high-level topics such as “basketball” or “movies”, but are much more subtle and don’t necessarily have a human meaning. These features are generated both from the video data (title, description, speech-to-text, …) and user behavior (similar users watch similar videos) and allows the engine to characterize the video - i.e. understand what the video is about. The engine can then leverage these features to recommend the best video to each user.
Traditionally, Recommender Systems fall into 2 categories: - Content-based: recommend videos similar to the one being watched (regardless of the user), - User-based: recommend videos that match the user’s tastes (regardless of the context) Our proprietary algorithm is hybrid since it not only mixes both approaches by leveraging embeddings for videos and visitors, but also allows us to incorporate other information, such as video recency or popularity. Therefore, it maximizes engagement by incorporating contextual and personalized recommendations.
We use three different algorithms that we combine into our proprietary hybrid recommender system:
- Word embeddings, based on extracted text (title, description, speech-to-text), which helps us characterize the videos. This state-of-the-art approach is more powerful than traditional semantics/NLP methods such as TF-IDF, because it can understand the meaning of words (synonyms or semantically close words).
- Collaborative filtering, which is supervised by users’ watch history, but learns video and user features in an unsupervised fashion (there is no feedback on which video should have which features). It characterizes both video and visitor behavior, by producing joint feature vectors.
- A supervised classification algorithm, that takes all the available information at recommendation time (currently watched video, all user and video features, collaborative and word-based, recency, popularity, …) and makes a decision in real-time. If we see each feature as a topic, then our embeddings can be seen as fuzzy clustering: each feature indicates how likely the video is to belong to a given topic.
Right now we do video-to-video, video-to-article, article-to-video recommendations. We don't do article-to-article as we are specialized in video. This is something we may reconsider and evaluate depending on your needs.
The user data we collect is only used by our internal algorithms to gain effective insights into user behavior and video content. We keep it safe and apply a strict rule of non-divulgation to third-party entities. At the moment, this data remains abstract. However, we plan to make this data actionable for you, for instance by plugging it into your DMP to give you information about your users’ interests. Additionally, analytics and aggregated usage metrics are accessible in your dashboard.
There are 2 sides to this question:
- For content classification, we typically use the IAB taxonomy, but can also adapt to your custom taxonomy. To determine it we use NLP on metadata, speech-to-text and soon image analysis.
- For user's interests, our machine learning algorithm will automatically determine a set of abstract features that are specific to each website. These features can be seen as clusters of users that share the same interests, but in a much finer and subtle way than generic IAB categories for example.
We use state-of-the-art big data distributed technologies that can scale to any volume of traffic: Kafka, Cassandra, Spark, ElasticSearch, Redshift, PostgreSQL, etc. The system is aggressively cached (with Redis and CDNs) and optimized. As a result, we can serve recommendations in less than 100ms.