home Credit Recommendation system. Absolute usability

Recommendation system. Absolute usability

When recommendation systems were just starting to be unobtrusively implemented on various resources, it seemed like a nice addition to the process independent search. When the choice of products or any content is large enough, the search turns into an amusing trip with often unpredictable results. For example, I was never interested in horror films, preferring films of a slightly different direction, however, thanks to random digging through the content, one day I came across a classic Hellraiser, a casual viewing of which left me with a strong and indelible impression. I am sure that each of the readers has at least once been enriched in a cultural or aesthetic sense precisely thanks to a random search and actions at random. On the other hand, I discovered a lot of interesting things with the help of recommendations that thematic resources provide me. Many films, books, music or products became known to me (and interesting) only because of the successful operation of the recommendation system. What is typical is that now I almost always rely on recommendations and look for something on my own much less often, because there is simply no time left for the latter!

This state of affairs is aggravated by the fact that I see the extent to which recommendation algorithms have begun to understand me. If previously successful hits did not happen so often, today at least a good half of the recommended things interest me to one degree or another. And when I still try, instead of apathetically accepting what is offered to me, to find something worthwhile on my own, I quickly give in under the pressure of incredible, unprecedented abundance. And the further you go, the clearer the picture of the not-so-distant future emerges, when the surrounding reality will continuously adapt to your personality, constantly transforming and learning. Never before in the history of mankind has comfort been so menacingly absolute. And never before have loopholes for incredible random finds been withdrawn from use so quickly and categorically.

Accepting the coming future as it is, it is worth learning to evaluate it critically, identifying dubious or even dark sides with the same zeal with which we strive to use innovations in everyday life that make our lot easier. Let's try to understand the subject of our conversation today.

Filtering methods used in recommendation systems

Collaborative filtering

Collaborative filtering is widely used, not least because of the relative ease of implementation. The principle of its operation is indeed simple, although it can be divided into two different approaches.

An approach based on user matching (popularly known as user-based) takes into account the similarity of a given user to other users involved in the system. For example, if Vasily positively assessed Lady Gaga, Oasis and Led Zeppelin, then Anastasia, who loves Lady Gaga and Led Zeppelin, can try to suggest Oasis.

The concept of object comparison (item-based, respectively), on the contrary, analyzes the objects themselves and reveals their similarity to those that Vasily once liked. In practice, it looks like this: Vasily once liked Radiohead and Blur, why don’t we offer him Oasis as well?

Collaborative filtering allows you to get highly accurate and relevant recommendations based on the analysis and comparison of differences among users with similar behavior.

Vasily and Anastasia: mutual automatic recommendations based on differences in preferences.

Content filtering

Content filtering builds internal communications between offered products or any content. This simple principle manifests itself in recommending objects to the user that are similar to those he previously selected. For example, if you purchase a manual on playing the guitar in a bookstore, you will automatically be offered other popular tutorials or manuals by the same author. A big advantage of recommendation systems that use the principle of content filtering is the ability to interest a new user in offers literally from his first consumer steps. You don’t need to collect data about a person’s preferences for a long time; you can immediately include the visitor in working with the resource. Also, an important advantage of content filtering is the ability to recommend to the user those objects that were not rated and passed over by other users. Last moment often occurs when using a collaborative method.

Content filtering completely ignores user opinions about certain objects. By building connections purely between the objects themselves, we have the opportunity to instantly, without collecting ratings and additional personal information, offer a person something similar to the position that interests him. By excluding user experience from the recommendation system as a fundamental substance, we seem to solve the so-called problem. “cold start”, when the sparseness of user data prevents the system from developing personalized recommendations. However back side content filtering consists of completely inappropriate and sometimes simply ridiculous recommendations like “Have you bought a Toyota RAV4? You might also be interested in the Toyota Highlander!”

Another difficulty associated with using the principle of content filtering is the impressive amount of work involved in building connections between all objects in the system. But the main drawback of this method is expressed in a very low, and sometimes rather conditional hit on the target. Content filtering does not imply high degree personalization, so the accuracy of recommendations is relatively low.

Knowledge based filtering (Knowledge– based systems)

Systems of this type are widely used in online stores. In essence, knowledge-based recommendations are similar to the previous method of content filtering, however, such algorithms use a deeper analysis of objects, building connections between them not according to banal criteria of similarity, but based on the interconnectedness of certain groups of goods.

In practice, it looks like this - when purchasing, for example, a smartphone, the site offers you accessories suitable for use with your new device. These could be cases, headphones, memory cards and everything like that. You can additionally stimulate the buyer by providing discounts on accessories, which can be very useful in connection with the purchase of a new device.

Knowledge-based recommendations demonstrate good results, increasing the turnover of large networks trading platforms by tens of percent. In addition, unlike content filtering, this type of recommendation is highly accurate, suggesting to the user what he might actually need.

If you are interested in accurate recommendations, then you should definitely consider implementing a knowledge-based system on your website. Like content filtering, a knowledge-based recommendation system studies and analyzes the relationships between objects (products), but in addition, it takes into account a number of additional options related to the individual properties of a particular user.

a) User wishes. A situation familiar to everyone - the site asks the user to indicate the desired characteristics, after which it offers products that match the request.

Yandex.Market and its checkboxes are successful and shining example recommendation system that is guided by user requirements.

b) Demographic features. In fact, demographic data is used by major social networks such as Facebook, LinkedIn, VKontakte and others to make recommendations.

Of course, to implement such a system you need to work hard - you will have to collect and process a huge amount of data.

Hybrid filtration

The most powerful and difficult to implement tool. Apparently, the future lies in combining various recommendation mechanisms into a single powerful algorithm. That absolute comfort and personalized reality that we talked about at the beginning of the article will be realized precisely with the help of a hybrid of the most effective methods recommendations.

Such an example is demonstrated by Netflix, whose complex hybrid recommendation system, which demonstrates unique accuracy, is constantly being improved and modernized. The development of such a powerful algorithm is largely due to the generous funding of research in this area by Netflix itself, which in 2006 offered $1,000,000 to improve its recommendation system by 10%.

BellKor's Pragmatic Chaos development team who managed to improve the algorithmNetflixby 10.09%.

A few words about practical steps as a conclusion

Choice specific type filtering or a combination of several methods directly depends on two factors - the complexity of your project and the amount of its funding. For example, creating an algorithm for a system of thematic blogs that intersect with each other is a relatively simple and moderately expensive task. Larger and more heterogeneous projects, such as online stores, require greater costs, especially if the goal is to increase conversion by truly significant amounts. As a rule, in such projects it is not possible to limit oneself to just one type of recommendation algorithm and it is necessary to use hybrid filtering, as a result of which the cost and complexity of development increases by orders of magnitude.

To create, implement and debug a hybrid algorithm, you will need a whole team of experienced developers who are well aware of what linear and relational algebra are, and also have a range of skills that make the creators of recommendation algorithms virtually a separate profession.

One way or another, when developing a project that offers the user the opportunity to select specific objects from a general set, it is necessary to take into account the rapid progress of usability in absolutely all areas human life– from sleep optimization using devices that analyze all processes occurring in sleep and issue recommendations for improving it, to automatic selection everyday goods based on the current needs of the user. As you know, an indispensable condition for the success of any undertaking is its exact correspondence with the spirit of the times.

On April 28, 2016, we officially announced the launch of the first adaptive course on Stepic.org, which selects Python problems depending on the student’s level. Before this, we also implemented recommended lessons on the platform, so that students would not forget what they had already completed, and would discover new topics that might interest them.

Under the cut there are two main topics:

about online education, pros/cons/pitfalls;
classification of recommender systems, their applicability in education, examples.

About online education, its pros, cons and pitfalls

This part is mostly introductory, characterizing online education, exciting details of recommendation systems under the following picture :)

IN modern world Online education is gradually becoming more popular. Opportunity to learn from leading professors educational institutions, studying new areas, gaining knowledge needed for work without leaving home, attracts a large number of people.

One of the most common forms of online learning is massive open online courses (MOOCs, Massive Open Online Courses). Most often they include videos, slides and text content, prepared by the teacher, as well as tasks to test knowledge, which are usually checked automatically, but it is also possible for students to check each other’s work. A wide variety of task types can be offered as tasks: from simple choice the correct answer before writing an essay and even, as we do on Stepik, programming tasks with automatic checking.

Online education has its own characteristics that distinguish it from conventional, offline education. Among the advantages, firstly, the already mentioned above accessibility to everyone who has access to the Internet. Secondly, it has almost unlimited scalability: thanks to automated verification of tasks, thousands of people can simultaneously study in the course, which is not comparable to conventional courses in classrooms. Thirdly, each student can choose a convenient time and pace for completing the material. Fourth, educators have access to a wealth of data about how users complete their courses, which they can use to analyze and improve their materials.

At the same time, there are also disadvantages to online learning. Unlike traditional education, where the student is always motivated by evaluation of his academic performance, in the case of online courses there is no penalty for failing a course. Because of this, the share of those who completed the course of those who signed up for it rarely exceeds 10% (on our Stepik, Anatoly Karpov’s course “Fundamentals of Statistics” was the best according to EdCrunch Awards 2015; a record 17% of those who signed up passed the first launch, but this is rather an exception). In addition, due to the large number of students, the teacher does not have the opportunity to pay individual attention to each student in accordance with his level and capabilities.

We set ourselves the task of creating a recommendation system that could advise a student on content that is interesting to him and take into account his level of preparation and gaps in knowledge. In addition, the system must be able to assess the complexity of content. This is necessary, in particular, for adaptive recommendations that will help the user study the material, flexibly adapting to it, offering exactly the content that he needs now for learning. Such a system will benefit users with personalized lesson recommendations that can help them learn a specific topic or offer something new.

In general, learning should have become even more interesting!

One of the first examples of a modern recommendation system is movielens.org, which suggests movies to users based on their preferences. This service is interesting because it provides everyone with an extensive set of data about films and ratings given to them by users. This dataset has been used in a lot of research in the field of recommender systems over the past two decades.

Systems based on content filtering. Such systems offer users content similar to what they have previously studied. Similarity is calculated using the characteristics of the objects being compared. For example, genre affinity or cast may be used to recommend movies. This approach is used in the service for rating, searching and recommending films Internet Movie Database.
Systems using collaborative filtering. In this case, the user is offered content that is of interest to similar users. The recommendations of the MovieLens service are based precisely on this approach.
Hybrid systems combining the two previous approaches. This type of system is used in Netflix, a service for online viewing of films and TV series.

We created a hybrid system with more active use of content filtering and less active use of collaborative filtering.

There is a lot of research on recommender systems for Technology Enhanced Learning. The specificity of the task in this case adds new directions for the development of the recommender system.

What are the features of the recommendation system of an educational project?

Firstly, it is possible to build an adaptive recommendation system that will adapt to the needs of the user at a particular moment and offer him the best ways to study the material. In this format, various simulators can be implemented, for example, in mathematics or some programming language, containing many tasks of varying complexity, of which different ones will be suitable for different students at any given time.

Second, it is possible to extract dependencies between training materials from data about how users complete them.

This data can help extract individual topics in materials, connections between these topics, and their relationships in complexity.

Coursera, EdX, Udacity (online learning platforms) use their recommendation systems to recommend courses to users that may be of interest to them. The disadvantage of these recommendations is that they can only offer the entire course, but not some part of it, even if the user is only interested in that part. Also, a system built in this way cannot help the user in studying the course that he has chosen.

The MathsGarden resource recommendation system, on the contrary, works with the smallest pieces of content - individual tasks. It is a simulator for elementary arithmetic for students primary school, which offers the student tasks that are optimally suited to him in this moment time according to complexity.
To do this, the system calculates and dynamically changes the relative characteristic of the student’s knowledge, as well as the characteristic of the complexity of the tasks, but more on this later.

In the following articles, we will talk in more detail about the Stepic.org device and the implementation of the recommender system, define what an adaptive recommender system is, and analyze the results obtained in detail. It will be fun:)

Let's start by defining what recommender systems are. These are programs and services that try to determine what users want to see and provide it to them (or recommend it, hence the name). Each of us has probably encountered similar techniques on various sites. Today we will describe the types and operating principles of such programs, and also give examples of these algorithms in action. Read to the end, it will be interesting!

Above we described what recommender systems are, now we will tell you in more detail about their importance. These programs have improved the way the site and the visitor interact because instead of providing static information, the user receives an interactive experience.

Recommendations are generated separately for each person, based on his previous actions on a specific web resource or based on past activity. In addition, the behavior of previous participants in the process also matters.

For online stores, this is, in principle, an important function, and for large catalogs like Amazon, it is one of the few ways to work efficiently. The recommendation method in this case is not a regular additional option; it provides ease of user navigation through the web resource. If Digital catalogue contains more than 20,000 products, orientation already seems prohibitively difficult, what can we say if there are millions of products?

How tiring is it for a potential buyer to interact with such a site? The answer is obvious. And a widget for searching for products that are visually similar to the one you are looking for, or belonging to the same group of products, or complementary products (when you are offered to choose a handbag for a pair of shoes, for example) comes to the rescue. This solution not only increases the number of views, it has a positive effect on conversion.

As practice shows, not only online stores use this technique. Social media are also not lagging behind. Below is an example from VKontakte.

Also, similar techniques can be easily seen on various social platforms, portals dedicated to literature, travel, news resources, online stores, in a word - almost everywhere. This technique is really very popular. The Kinopoisk web resource is another accessible example.

Techniques

So, the first type is explicit data collection. As you might guess from the name, the user himself provides the materials necessary for the work. For example, when the recommendation systems of Yandex or other search engines ask a person to rate different elements, make a list of favorites in a certain area, or answer several questions. If a person refuses to give information on his own, the following technique will be relevant.

The second type is implicit data collection. Relatively speaking, this is a spy mission, according to which the actions of a participant in the process are recorded by a program for further processing and application. What do you need for this? The program recognizes purchases, ratings on sites, collects information on views and comments. Of course, the choice of such a technique entails some ethical problems, because the protection of personal data is one of the main requirements that the user places on search engines. But for now, the fact remains that some kind of surveillance is possible, and ordinary website visitors cannot check whether such events are really taking place.

The first basic technique is called collaborative filtering. Recommendations using this technique are made based on the behavioral characteristics of one person or group of people, the latter is even more effective. Groups gather people who are similar in behavior and characteristics.

Let's give an example to make the information easier to understand. A website is being created where musical works will be recommended to the audience. How will recommendation services based on collaborative methodology work in this case? According to this principle: one community will be taken as a basis, where participants add tracks of the same genre to the playlist. Next, the most popular of all pieces of music are determined and recommended to one user from the group who has not yet listened to this melody.

The second approach is called content-based filtering. Here the recommendation is formed based on human behavior. This approach can also take the browsing history of a specific participant as a basis.

This time we will give an example with thematic online magazines. So, in the case where a person has previously read content about mountain biking and regularly commented on blog articles with such content, then the content filtering method will use this past information to identify similar resources and suggest them to him as a recommendation for that user.

There are also mixed approaches, according to which the development of a recommendation system is carried out.

A blended approach is a combination of collaborative and content filtering. As we know, more is better, so mixing these two techniques increases the efficiency of recommendation systems, namely, significantly increasing the accuracy of predictions for specific people.

Algorithms

Pearson correlation

This algorithm allows you to select General characteristics between several users. How? Using simple mathematics, namely, determining the linear relationship between two elements. Important point- this technique is not suitable for a community of people.

Clustering

This principle of operation of recommender systems is based on identifying similarities between elements (users) by calculating their proximity to each other in the so-called feature space. Signs are those elements on which the interests of certain participants in the process converge (for music resources these are tracks, for movie portals - films). Users with similar characteristics are combined into so-called clusters.

Collaborative Filtering Algorithm

Hard clustering can be replaced by another algorithm, which works according to a rather complex formula, and, like all the previous ones, is based on the behavior of users from its group. However, this technique has several rather significant disadvantages. First, it's difficult for new or atypical users (those who don't form groups) to find recommendations. Secondly, the so-called “cold start”, when new objects do not enter the recommendation systems.

Content filtering algorithm

The algorithm is symmetrical to the previous one, but if in the first case we started from the assumption that the user will like the object because his “classmates” like it, then here we will recommend based on similar objects that he has already noted for himself. And here, traditionally, several problems can be identified. The same “cold start” and the fact that the recommendations are often mundane.

Instead of a conclusion

So, we have provided all the information that a beginner or a simple layman should know about recommendation systems. Let's be honest, algorithms are somewhat difficult for an untrained person, so this article does not contain mathematical formulas, although the algorithms are based on them.

Recommendation programs are useful services for both ordinary Internet users and researchers and online businessmen. Those who want to increase conversions and the number of views should pay attention to this technique and be sure to implement it to increase the efficiency of a web resource, especially an online store.