Tutorial
9 min read

News Recommendation: the challenging area in building recommendation systems

Remember our whitepaper “Guide to Recommendation Systems. Implementation of Machine Learning in Business” from the middle of last year? Our data scientist, Michal Stawikowski, did an excellent job of giving you a cross-sectional overview of the issues related to recommender systems. In his paper, we analyzed the issue from both the business side and dived into the technical details. We also presented an example of a four-step recommender system, where in successive steps the results are retrieved, filtered, scanned and sorted. You can also find out what QuickStart ML Blueprints are and how they can help data scientists and engineers with building recommendation systems. Download the white paper here.

recommendation-systems-ebook-getindata

Personalised news recommendation systems

Today I would like to focus on a specific issue, namely news recommendation. With the development of artificial intelligence, new solutions have started to appear in recent months, based, for example, on GPT-4 or diffusion models to improve the effectiveness of recommendation engines. However, solutions based on slightly older resolutions such as TF-IDF, word2vec or Bag-of-Words are still leading the way.

As a recap, below is a breakdown of the most important approaches to building recommendation engines.

schema-recommendation-systems-getindata

To create a news recommendation engine, we can actually use any of the above approaches, depending on what our business objective and technological capabilities are. However, the news area is characterized by a particular sensitivity to the context of the news.

Traditional recommendation systems recommend articles according to how similar they are to articles in which the user was previously interested. Typically, similarity is measured using the distance between two pieces of text. A small distance indicates high similarity, while a large distance indicates low similarity. However, people's preference depends on several factors, including context or recent social media trends. For example, a text about the latest transfers of one football club may not be of interest to a fan of another team, such a news item may also become instantly irrelevant if the transfer does not materialise after all. It is important to remember that news recommendation systems face particular challenges because articles change quickly, data about readers is limited, and the relevance of articles is highly context-dependent. As a result, there is growing interest in creating personalised news recommendation systems that can provide users with articles that match their preferences and interests. One approach to creating such systems is to use contextual information. Users' reading preferences and habits can vary depending on their location, time of day and other factors. Given contextual information, news recommendation systems can personalise recommendations for each user, taking into account their current state. Capturing context and trends from users can be achieved in several ways, such as analysing the content of articles that users click on, tracking users' social media activity, using collaborative filtering to identify similar users based on their clicking behaviour, and using contextual information such as time of day, location, device and user profile to personalise recommendations.

Below you can find a classification of features used for news recommendation systems:

features-type-table-getindata

Taking these issues into account, the target solution should be to build a hybrid model, which takes into account both content and user behaviour and preferences.

News modeling

A key element in building methods for personalized news recommendations is news modeling. In this step, it is necessary to understand the content and capture the individual characteristics of the article. A large number of approaches can be used for this purpose, which we can divide into two main groups: feature-based methods and deep learning-based methods.

Feature-based methods use features prepared by the data scientist to represent news articles. These features are designed to capture different aspects of news content and contexts. In many collaborative filtering based methods, articles are represented by news ID’s. However, this approach can suffer from a 'cold start' problem, as new articles are constantly being published and old articles quickly disappear, resulting in limited coverage of news identifiers in the learning set. ID-based news modeling has many limitations, so additional techniques are often used to statistically describe news content. One of these is Term Frequency-Inverse Document Frequency (TF-IDF), which extracts features from news texts. Other content features are also often used, such as topic modeling, using techniques such as Latent Dirichlet Allocation (LDA) to extract topics from news titles, summaries and main content. In addition, other factors such as news popularity, frequency, sentiment and bias can also be used in the model to improve news representation.

On the other hand, deep learning-based methods use neural network models to automatically learn article representations from raw input data, such as news texts. In this case, we can largely skip the data preparation step. They are a competing approach to the one described above, often being able to more effectively capture the information and context of news articles by learning latent patterns from raw input data. For example, some methods use autoencoders, knowledge-aware convolutional neural networks (CNNs), multi-headed self-attention networks and pre-trained language models (PLMs) to encode news text. Deep learning-based methods for news recommendation systems can include news attributes, such as specific topics or concepts, in their analysis of news articles. In this way, these methods aim to gain a deeper understanding of the knowledge and common themes contained in news articles.

User modeling

The next step in building a recommender system is user modeling. During this phase, it is important to understand the interests and preferences of users. This involves constructing user profiles based on a set of characteristics extracted from clicked messages. Again, as with news modeling, methods can be broadly divided into feature-based and deep learning.

The first approach, feature-based user modeling, involves creating user profiles based on a set of features built from historical user behavior, including clicked messages. These methods use various additional user characteristics to facilitate user modeling, such as demographics (e.g. age, gender and occupation), user location, access patterns and user tags or keywords. In some cases, it may be possible to take into account user behavior on other platforms, such as social media and e-commerce platforms, to get additional information about user interests. However, this type of approach usually requires considerable expertise in feature design and validation and access to a wide range of data, preferably of good quality.

On the other hand, user modeling methods based on deep learning aim to learn representations of users based on their behavior, without the need for manual feature engineering. These methods infer user interests based on click behavior, which is an implicit indicator of a user's interest in messages. However, this data can be noisy and may not always accurately indicate a user's actual interests. To address this, many methods incorporate other types of information into user modeling, such as user IDs, contextual features (e.g. user devices and locations) and many types of user feedback on the news platform to incorporate user engagement information into user interest modeling. These methods can automatically learn deep representations of user interests for personalized news recommendations, which are typically more accurate than manually created user interest features.

Creating ranking

Once the characteristics of news stories and users have been modeled, the next step is to create a ranking of candidate news stories based on their relevance to the user's interests. This is a key step in personalized news recommendation, as it aims to present users with the most relevant and engaging articles. 

Relevance-based methods typically rank candidate articles based on their personalized match to the user's interests. The main problem with these methods is accurately measuring the relevance between candidate news items and the user's interests. Many techniques directly assess the relevance between the user and the news items, based on the similarity of their final representations. For example, some methods calculate the cosine similarity between user and message feature vectors (CF-IDF - Concept Frequency-Inverse Document Frequency) to measure their relevance. Other methods use similarities between vectors of message topics and user interests to determine relevance. One of the challenges of personalized relevance-based ranking is the problem of 'filter bubble', when recommending messages that are similar to those clicked on previously by users can limit diversity. To address this, strategies can be used to recommend messages that are slightly different from those clicked on previously, introducing variety and randomness.

Unlike relevance-based methods, ranking methods are based on reinforcement learning with the aim to optimize the total reward in the long term. These methods explore potential user interests and aim to improve long-term user experience and engagement. They have the ability to increase the diversity of recommendation results and discover potential user interests through exploration.

News Recommendation Systems - Summary

In comparison to recommendation systems in other domains such as movie recommendations, news recommendation engines face unique challenges due to the dynamic and time-sensitive nature of news content. While both types of recommendation systems leverage various techniques like collaborative filtering and content-based filtering, news recommendation engines must also contend with the scarcity of user data and the need for real-time adaptation to evolving news trends. Despite these differences, the overarching goal of personalized recommendation systems remains consistent: to provide users with relevant and engaging content tailored to their preferences and interests. 

If you are seeking support to delve deeper into near recommendation systems solutions, do not hesitate to take advantage of our experts' free consultation offers.

recommendation system
News modeling
personalised recommendation
news recommendation
29 February 2024

Want more? Check our articles

1RiTD99ILqsAaSQqY1GaLMw
Big Data Event

Five big ideas to learn at Big Data Tech Warsaw 2020

Hello again in 2020. It’s a new year and the new, 6th edition of Big Data Tech Warsaw is coming soon! Save the date: 27th of February. We have put…

Read more
data pipelines dbt bigquery getindata
Tutorial

Up & Running: data pipeline with BigQuery and dbt

Nowadays, companies need to deal with the processing of data collected in the organization data lake. As a result, data pipelines are becoming more…

Read more
getindata blog business value event processing
Use-cases/Project

Business value of event processing - use cases

Every second your IT systems exchange millions of messages. This information flow includes technical messages about opening a form on your website…

Read more
screenshot 2022 08 02 at 10.56.56
Tech News

2022 Big Data Trends: Retail and eCommerce become one of the hottest sectors for AI/ML

Nowadays, we can see that AI/ML is visible everywhere, including advertising, healthcare, education, finance, automotive, public transport…

Read more
getindata’s 2023 achievements

Reflecting on 2023: Celebrating GetInData’s Achievements in Data & AI

Let’s take a little step back to 2023 to summarize and celebrate our achievements. Last year was focused on knowledge-sharing actions and joining…

Read more
getindata big data blog apache spark iceberg
Tutorial

Apache Spark with Apache Iceberg - a way to boost your data pipeline performance and safety

SQL language was invented in 1970 and has powered databases for decades. It allows you not only to query the data, but also to modify it easily on the…

Read more

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.


What did you find most impressive about GetInData?

They did a very good job in finding people that fitted in Acast both technically as well as culturally.
Type the form or send a e-mail: hello@getindata.com
The administrator of your personal data is GetInData Poland Sp. z o.o. with its registered seat in Warsaw (02-508), 39/20 Pulawska St. Your data is processed for the purpose of provision of electronic services in accordance with the Terms & Conditions. For more information on personal data processing and your rights please see Privacy Policy.

By submitting this form, you agree to our Terms & Conditions and Privacy Policy