As of August 2023, UPI payments have crossed 10 billion in India, showcasing growing acceptance for digital payments. For any organisation, solving at scale can be an extremely daunting task. With over 11 million concurrent users on the app, ensuring seamless and efficient payment experiences for them comes with its own set of challenges.
At Dream11, experimentation runs deep in our DNA. We believe in building a culture that gives Dreamsters opportunities to experiment, fail and learn. In this blog, we deep dive into our journey to optimise payment success rates. We’ll touch upon the challenges we faced, the solutions deployed and the mathematical underpinnings behind our approach. Join us as we delve into the world of non-stationary bandits and how they've helped us maximise the success rate of payment routing.
Before we deep dive into our experiments, let’s try to understand more about payment routing
Our users make payment using various payment options across the country’s key payment service providers. In such a scenario, efficiently routing transactions is a complex challenge. Given this process is complex and challenging, downtime or hiccups can be as unpredictable as game-changing play. However, at Dream11, this is where the magic happens.
As shown in the figure above below, these transactions are facilitated by multiple payment gateways each with its unique characteristics and in-depth transaction flow - explained in this article. These gateways may experience sudden downtime with limited visibility due to underlying issues. To ensure payment transaction success, it's crucial to make informed decisions about the ideal payment gateway to use for each transaction, considering many factors that impact overall gateway performance.
Challenges faced while routing payments
How did we solve it?
The saviour here is Routing Service! This solution addresses the above challenges through a combination of reinforcement learning and meticulous engineering in the design of a distributed real-time machine learning inference system. Let us dive into it.
- State Space: This represents the current state of our system, which includes information about the payment gateway, success rates, and transaction details
- Action Space: We can take actions, such as choosing a payment gateway for a transaction
- Reward: The reward measures how successful our payment routing decision was. It takes into account the success or failure of the payment, costs, and other factors
Now, let's get into the mathematical underpinnings of our algorithms. To address the first challenge of assigning the relevant payment gateway for each transaction, we've employed a range of algorithms, each with its unique characteristics and advantages.
Epsilon greedy strategy: At every trial, it randomly chooses an action with probability ε and greedily chooses the highest value action with probability 1 - ε. By default, ε-greedy is unguided and chooses actions uniformly at random.
Upper-Confidence-Bound: UCB ****considers ****the uncertainty of an arm and selects arms that have the highest potential. Uncertainty is modelled via confidence bounds, while the potential is represented by the upper confidence bound. Since bounds guide sampling, it has lower regret.
\ Thompsom sampling: Thompson sampling ****models uncertainty by building a probability distribution (beta) from historical rewards and then samples from the distribution when choosing actions. If the rewards are delayed, UCB selects arms deterministically. It chooses the same action until feedback is incorporated. In contrast, because TS chooses actions stochastically by sampling from the posterior distribution, It has lower regret.
\ Boltzmann Gumbel: Boltzmann Gumbel assigns scores to each action, which are transformed into probabilities using a temperature parameter and the softmax function to balance exploration and exploitation.
Given the non-stationarity of the success rate time-series we also tried out discounting & windowing alternatives of these algorithms, the intuition behind it was to give more weight to recent rewards.
Experiments and Results
Dataset The dataset comprises the following attributes for each transaction
- ID: A unique identifier for each transaction
- Amount: The transaction amount denominated in Indian Rupees (INR)
- Source: Payment instrument from which the transaction was initiated
- Terminal: Identification number of the payment gateway for the transaction
- Success: A binary indicator indicating whether the transaction was successful/failed
The routing dataset used in this analysis comes from over 100M transactions conducted on the Dream11 app for a week in Dec, 2022. We used the first day’s transactions (tuning dataset) for model validation to decide the optimal parameter for each competing bandit algorithm. Later, we ran these algorithms with tuned parameters on a one-week transactions data (evaluation dataset).
To address the challenge of which algorithm to take online in production, we built a simulator for backtesting the learned policies. We assume the underlying payment gateways to follow a binomial distribution which models the no. of successes in a fixed number of independent trials, each with the same probability of success. In the case of a continuous-time process, we can think of trials as the time intervals over which we observe the process, and the success as the times at which the transactions are successful. Also, the probability of success in each time interval is assumed to be constant.
With the simulator handy, the analysis of discounting and window-based strategies on top of the set of multi-arm bandit algorithms described earlier was a cakewalk. Below are the results of the evaluation on the tuning dataset, we observe in the charts below how discounting and windowing old beliefs reduces overall regret.
While benchmarking the algorithms on the evaluation dataset we observe that UCB (sliding window length of 200 transactions & with a discount factor of 0.6) has the lowest cumulative regret. The discounted Boltzmann Gumbel algorithm also achieves comparable performance as the discounted UCB algorithm. During the testing, three algorithms – UCB algorithm with w = 200, α = 0.6; Thompson sampling w = 200, α = 0.6 and 𝜖 greedy algorithm with 𝜖 = 0.2, w = 100 yielded the lowest regret.
The Engineering Lens Behind The Experiment: System Architecture
Bandit System Requirements: We design our production system to incur low technical debt using the decision service architecture from the research paper by Microsoft [Figure 1]
- Explore: Inference of the bandit algorithm to assign scores to payment gateways based on business agreements with merchants and banks
- Log: This component generates the reward data (success/failure)
- Learn: This module performs online policy training based on the rewards accrued and returns the updated policy
- Deploy: The updated policy is deployed based on the experiment configuration
System Architecture Innovation at Dream11
With the challenges described earlier in the blog, the proposed system is expected to deliver a highly scalable, reliable, secure, maintainable, and cost-effective payment processing solution by adhering to these critical design requirements. We evaluated three approaches a classic HTTP service, a clipper-based approach
Tradeoffs in system design
Now lets benchmark the approaches across the system design requirements derived from the challenges we face while designing such payment routing systems.
We deployed the agent-based system, which eliminated the need for a separate database, thereby saving costs significantly. It further reduced the system’s latencies by avoiding the overhead linked to the external database connections. The use of an asynchronous agent within this system has allowed us to forgo a job scheduler, as this agent can continuously update the rewards asynchronously within the same service. The most compelling advantage offered by this method is its ability to scale to a remarkably high level of concurrent requests, a crucial requirement for our system. Hence, despite the inherent tradeoffs, the advantages offered by the third method have far outweighed its limitations, leading us to choose it for our current system implementation.
Conclusion: Online Experiments
The study aims to assess the performance of optimal algorithms by conducting online experiments involving around 240 million transactions over 60 days. These transactions were randomly assigned to one of four routing algorithms: the existing rule-based algorithm and three bandit-based algorithms chosen based on a specified method. The standard algorithm handled about 10% of the transactions, while each bandit algorithm processed roughly 30% daily.As shown in Figure 6 below, the results highlight the differences in daily success rates between the standard algorithm and the three bandit-based algorithms from May 1, 2023, to May 25, 2023. Notably, the sliding-window UCB bandit algorithm outperformed the others, achieving a statistically significant cumulative uplift of 0.92% over one month.
We proposed a Routing Service architecture using a novel Ray-based implementation for optimally scaling bandit-based payment routing to over 10,000 transactions per second. To dig deeper read through our research paper published in the AI-MLSYSTEMS 2023 conference. Our work has far-reaching implications, especially in the fast-paced world of fintech payments. With the massive scale of National Payments Corporation Of India (NPCI) UPI (Unified Payments Interface) transactions in India, our system can revolutionize payment gateways by optimizing success rates in real time. Imagine the impact on the entire payments ecosystem, from enhancing user experiences to boosting financial efficiency.
Our future endeavours will focus on predicting time series change points, which promises to minimize regret and elevate our system's performance and enhance overall success rates in the face of fluctuations.
While this experiment has been a great success, it's worth noting that we've left some stones unturned. The cost of transaction fees and the nuanced context of user transactions are territories yet to be explored. We're excited to embrace this challenge in our ongoing pursuit of delivering world-class user experiences to our 200 million+ users.