Enabling Hive on Spark on CDH 5.14 — a few problems (and solutions)
Recently I’ve had an opportunity to configure CDH 5.14 Hadoop cluster of one of GetInData’s customers to make it possible to use Hive on Spark…
Read more
Panem et circenses can be literally translated to “bread and circuses”. This phrase, first said by Juvenal, a once well-known Roman poet is simple but meaningful. The audience should be provided with a product that it desires. Though this statement is known for decades, it wasn’t always possible to make it happen due to not only the time gap between spotting some behavioural pattern and adjusting the “feed” to it (services, products etc.), but also a complete inability to capture this type of data — like it didn’t exist. Along with the rise of the modern big data recommendation systems, it is much easier to detect the audience preferences and contribute to the creation of a bespoke-like content.
Let’s take video streaming industry as an example with its main player, Netflix. The California-based OTT (over-the-top) content provider was set up in 1997 (more than 20 years ago!) and was operating as a DVD-by-mail rental store on the basis of a monthly subscription fee which was a novum when the pay-per-rent model was carrying the day. In a little bit more than 20 years, Netflix has redefined its business approach from not only content provider but also content creator, gathering the audience of over 140 million subscribers worldwide.
You may be asking yourself how a once-DVD rental company is now staying behind such a well-known titles as House of Cards, Narcos or Orange is the New Black ? Was it a roll of the dice?
Before I’ll answer it, let’s take a look at the general context of video streaming industry. Sandvine, a global network intelligence solutions provider states that as of first half of 2018, videos were responsible for almost 58% of the total unencrypted downstream volume of traffic on the internet while Netflix covered one third of it (this gives c. 15% of the total traffic) being the biggest single web generator. No doubt, such a volume of data is impressing for an average Internet user and challenging for big data industry. These huge amounts of data are not only produced by a simple movie streaming, but also by actions associated with it: timestamps, geolocation, user preferences and so on — an actionable intelligence for ML and AI that can produce useful insights and great user recommendations.
Now we can come back to our initial question. Netflix is not only a video streaming provider. In the last couple of years, thanks to the implementations of big data solutions in terms of spotting behavioral patterns and driving user recommendations, it has started becoming also a content creator. And this big data-ish approach had a major contribution to a considerably better performance of the company, since 2013.
2013 can be treated as a red-letter date in the history of Netflix. The company has released the first very own five productions: three Netflix Originals productions: Orange is the New Black, House of Cards, Hemlock Glove and two animated series: Ever After High and Turbo FAST. By 2018 the number of in-house releases went up to 700(!). Even though not all of the productions has become marvellously successful, the trend to create own TV series by other video streaming providers: HBO with HBO GO brand or Amazon with Amazon Prime, has been initiated by Netflix.
Netflix has always used data to decide how to view the movies, which shows to license and where — now that expertise is extended to the first-run (vide 700 TV series). This is supported by some great examples of “out-of-the-box” mindset. Netflix has decided to purchase the rights to Prison Break TV series in Poland as this show was extremely popular on pirate streaming sites in Poland lately — this decision will considerably contribute the growth in number of subscribers in Poland. But today let’s focus on the House of Cards case in order to explain the Netflix’s approach towards enhancing the user experience. In March 2011 Netflix decided to purchase the rights to air the title, outbidding traditional cable networks like HBO or AMC. Though it was considered as an unexpected, this decision was driven by the outcome delivered by predictive modelling processes. Thanks to the collaborative filtering model (a model basing on the assumption that users have similar tastes and opinions in general), Netflix managed to identify that subscribers who watched the original British version of the series were keen to watch movies with Kevin Spacey or directed by David Fincher. That gave the initial background to create (or engineer) a model TV series that were born to succeed. Before the release, Netflix also conducted a series of testing with trailers to prove their approach. There were 10 different trailers that were shown to different audience groups in order to suit their trailer preferences better. After the premiere, the further user feedback, the content has been somewhat shaped to please the audience more. What’s more, take a look at how the show content was changing. The presence of children starting House of Cards season 2 has been reduced comparing to the first season. You also won’t see any suffering animals after the pilot episode.
This is the Netflix’s handicap and also kind of an interpretation of the article’s title, panem et circenses. By learning from its users (here comes Machine Learning, ML), the streaming service is able to optimize its content (to all Netflix users: do you remember the request from the system after you first log-in to choose three titles that you like most in order to provide you with the best service? This is it!) in order to engage more people by delivering shows that are exactly what people want. Based on the shows you viewed and/or liked, the company gathers information on your suggestions anytime in order to drive a better user experience.
No doubt, the big data contribution to the House of Cards TV series is a textbook example that any business, regardless of industry or sector, can reshape its approach by turning loads of data into smart data and benefit out of it. The HoC licence purchase was a sensible, conscious shopping target sanctioned by predictive modelling outcomes. Netflix executives knew, basing on the viewing habits of its 33 million users (back in 2012/2013) that a combination of a political drama, Kevin Spacey as an actor and David Fincher as a director will contribute to a winner project. Though such an attitude in terms of decision-making may somehow recall the plot of Moneyball movie where the Oakland Athletic coach Billy Beane has decided to change the team structure basing on the personal players’ statistics, the outlook of the Netflix business as well as user experience proves that scientific-based decision can positively impact a business and please the audience.
Recently I’ve had an opportunity to configure CDH 5.14 Hadoop cluster of one of GetInData’s customers to make it possible to use Hive on Spark…
Read moreAt GetInData, we build elastic MLOps platforms to fit our customer’s needs. One of the key functionalities of the MLOps platform is the ability to…
Read moreOur recently released white paper, "Data Democratization Through Data Management" offers an in-depth exploration of the subject. This article will…
Read moreThis blog is based on the talk “Simplified Data Management and Process Scheduling in Hadoop” that we gave at the Big Data Technical Conference in…
Read moreThe Data Mass Gdańsk Summit is behind us. So, the time has come to review and summarize the 2023 edition. In this blog post, we will give you a review…
Read moreData Mesh as an answer In more complex Data Lakes, I usually meet the following problems in organizations that make data usage very inefficient: Teams…
Read moreTogether, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.
What did you find most impressive about GetInData?