Introduction

A fantasy sport is a game where participants rack their brains to create virtual teams composed of real players of a real-life sport. These teams compete based on the statistical performance of those players in actual matches and get rewarded based on various contest types.

Every time users want to play a match on dream11 app; they have to create a team for that match. Team creation is at the core of these games and at Dream 11 we are obsessed about proving a great team creation experience to the users where they are equipped with sufficient knowledge and challenge to use their skills to the best of their abilities.

For making team, users should be aware about the game, fantasy scoring system and that particular match. Game and Fantasy scoring system is common knowledge out of the 3 of these, but information related to a particular match is difficult to find. For example in a cricket match you have to know about 22 players in total, recent form, condition of pitch, historical performances to name a few. Tracking of all this information will require users to visit multiple websites like cricinfo, cricbuzz and collating all the relevant numbers. While this is essential part of the skill in fantasy game, the rigour and effort required compromises the spirit of fun in the game. In terms of better user experience it would be helpful in terms of time and cognitive load if they get a good benchmark of quality of all the players involved. This benchmark should ideally give users a good proxy of importance of players from a fantasy point of view. On dream11 platform credit of a player represents that benchmark which carries all the basic information about that player and can prove to be helpful to users in their fantasy sports experience. How useful it is to users can be seen in their testimonial tweets when we tried removing the credit system from platform.

User complaints on the absence of credits

With Dream11 hosting around 10,000+ matches every year on its platform, have you ever wondered what all goes behind hosting these matches? It starts from deciding the match to host, generating the credit of players (keeping the user’s perspective in mind) and taking the match live - the whole picture is much bigger. Deciding credits of players for a match itself takes around 30 - 40 mins for a team of two. It can be quite heuristic in nature since not all the information about new players would be readily available on internet. Apart from being difficult to scale, human judgement is also proven to be inconsistent and more often than not be sentiment driven, which may or may not be aligned with the broader user sentiment.

The major challenge for Dream11 in deciding players credit is, it is different from conventional pricing, where the price (credits in our case) is reached as an equilibrium of supply and demand. In our case player credits will have more indirect impact on business. It will impact total unique teams (all permutations possible to create a team), win/loss distribution of users, experience while playing the game (Pricing a popular player at lower end will not be well received but pricing at higher end will make it prohibitive to have them in anyone’s team) etc. To tackle all of these in one shot, the pricing (or credit) system was created to mimic Player’s performance on the ground.

Objective of creating a multi-variate ML pricing model

Develop a sports agnostic model which helps generate the credits for each player for each round with minimum manual interference.

Building a single model to predict multiple outputs

Our end goal was to generate the credit for each player, so that system can host the matches without any manual intervention in less time.

Model development lifecycle to value to impact

Automated credit generation for the player was quite challenging in nature. Developing an end-to-end credit system can be broken down into the following sub-problems which are listed below.

Data blending
Feature engineering
Playing 11 and playing 15 prediction
Developing measure for quantifying form of a player
Performance to credit mapping & theoretical unique team optimisation
Impact of pricing

1. Data blending

Our first challenge was to gather a large amount of data about players' performance, fixtures, schedule and squad. There are multiple information sources for popular competitions, but when it comes to lesser-known leagues, the information available about a player's performance is limited. Team details are not updated regularly. In sports like football, Team changes are done till the time matches begin.

The Data Collection process starts with identifying all the important tours happening throughout the year and collating it via various data providers. We combined it with dream11 data for player performance and player popularity and the combined dataflow was used in the player credit system.

2. Feature engineering

After gathering data there’s question of converting it into information as feature engineering is the heart of any machine learning model. The goodness or accuracy of a model depends on the how well the features are engineered. In our data , the real challenge was missing data or many inaccuracies in data . For most players, data was missing. We handle missing values in different ways. If a missing value in a feature has a negative connotation in terms of performance then it is replaced with a large negative value, If a missing value just depict data not captured it was replaced with a measure of central tendency, If a missing value can be interpreted positively for performance then it was replaced with a very high value. Why imputing with 0 won’t work for us in most cases is, 0 is one of the realistic value for most feature and we want the final approach to be able to differentiate between a NULL value imputation and an actual value present in the data.

We have done other treatments to our data like variable transformation, categorical encoding, date and time engineering (it was used to get the inactivity of a player). The modelling approach was robust to outliers in the data, so data was not treated for any outliers. It also makes sense to not remove/clip any data point as all the data points happened in real world sporting scenarios.

We have divided our features into four categories :-

Player representation\ (Player type , Player position , Starter/Sub etc)
Player performance representation\ (Recent performance, Performance at home, Performance away, Performance against same opponent, Performance at same venue, Performance in this tour, Performance in past tour etc)
Measure of player popularity\ (Based on crowd intelligence, popularity of a player etc)
Match representation\ (Opponent details, Venue details , Home/Away details etc)

3. Predicting playing 11

In any game, selection of player who is going to be a part of starting line up plays a vital role in the final performance of the player. Usually starting lineups are announced 20-60 mins before the start of the actual game, but on our platforms, we open any game at-least 24 hour before the start of the game so credits for a player has to be devised 24 hour before. To solve this problem , we predict the starting lineup of the game.

Features used to calculate starting lineup are :-

Inactivity level measure
Performance level measure
Player level Measure

We tried XGBoost classifier model, Logistic Regression model and a heuristic based model.

In heuristic based model, we passed three types of measures broadly classified into inactive days, recent performance and recent status and then trying to predict player’s score. Player with higher score is more likely to be in starting lineup.

Where value of α, β and γ are calculated after running regression on actual matches over training duration.

To improve the accuracy of starting lineup, we trained XG Boost Classifier model with some more features, and currently we are using XG Boost classifier model to predict starting lineup in pipeline.

4. Developing measure for quantifying form of a player

We train a machine learning algorithm to calculate the form of a player. The model is trained on past matches and tested on a different out-of-time sampled set of matches. As the performance was modelled in terms of real numbers the regression metrics (MAE, RMSE etc) were used in deciding the goodness-of-fit of the model.

After running multiple models, XGBoost was finalised upon as it handles all the non-linearity of underlying distributions well. The modelling choice was empirical in the sense that various models were tried on the dataset and XGBoost was found to be working best for the given problem.

Formulation:

5. Form to credit mapping

We are integrating player’s Form and popularity to generate a final score for each player.

We have generated the final score for different players in the past and for each score there were some players credit mapped. We have cut our scores into 10 buckets and pivoted it against credit. We have created a function on the basis of these scores which is used for score to credit mapping.

Also , in any game ranking of the teams playing are very important. Sports council releases those ranking and keep updating it. For most of the sport fans, including me, what matters the most is the ranking for eg if Team A is ranked 1st and team B ranked 10th supposing they are playing against each other for the next game. Team A will generally perform better. Thus we also take into account ranking of teams to decide the credit of players.

Unique team combination possible

Team creation on Dream11 has constraints on the number of players one can pick from a particular type (wicket keeper / batsmen/bowler etc) . Also one should pick a limited number of players from each team. Through this model, we optimised credit of players in such a manner that, the number of unique permutations on the player combinations making up a team is maximised.