A Peek into the Backend Test Automation @ Dream11

Published on

The Dream11 user experience is built on top of a micro service architecture which serves a unique and personalised experience for all 100M+ users. Our team extensively uses a Behaviour Driven Development (BDD) approach and BDD-based test framework for automating backend micro services.

Backend is a piece of software that runs on remote machines called servers. It can be accessed through the internet/VPC via API. The backend is not meant to be used by humans directly, but rather by applications (frontend apps). Its purpose is to perform remote tasks which cannot be performed by the frontend apps. Backend Test Automation is a testing method that checks the backend of a software or web application. Since repeating the manual tests for the backend is a cumbersome process, our team has written additional code to reduce manual intervention and chances of human error.

All the backend actions are done through different micro services. For instance, when a user logs in to the app, they interact with the user micro service and similarly there is a deposit micro service to take care of users depositing cash. These micro services are tested end-to-end before every release. Each micro service has its own independent releases. Owing to this, incorporating an integration test suite became vital and depending on the level of success, it is further shipped to production.

We came across the following problem statements based on our previous experiences of introducing test automation at Dream11 and existing test frameworks:

  1. The framework should be able to run tests across multiple environments: At Dream11, we have multiple teams which provision multiple environments in terms of Feature, Integration and Load Testing, where heterogeneous workloads get provisioned from containers to VMs (EC2s). This provides flexibility for our product teams to iteratively build features with all the other dependent services onboarded and move forward with release cycles to ensure that all environments are functional. As they get provisioned through automation, it was difficult to validate each and every environment and identify functional gaps around them. It was a bare necessity that the automation tests should run across environments based on user input. This would mean endpoints for micro services or edge layers should be generated at run-time and not hard-coded in the system.
  2. Minimal or no data dependencies: Having the tests rely on certain data on the environment was not preferable. Data dependency would mean manual intervention for data creation before we run automation, which would defy the whole purpose of automation.
  3. Ease of maintenance and reusability: It should be convenient for any new member to write tests and contribute without any difficulty. The automation should be easy to maintain and debug and in addition, the functions, methods, or classes should be reusable
  4. Easy Onboarding: For members not familiar with BDD, it should be easy for them to adopt it. Easy for new members on the team to understand and get acquainted with the framework and start contributing almost immediately.
  5. Ease of adding new test cases: For every modification or new addition, there should not be the need to do a lot of rework. It should be a straight-forward method for even manual QA engineers to add test cases to the framework.

The tests are primarily based on three criteria: context (a prerequisite given, like certain data-based values, event (API call) and possible outcomes.

Solution:

After a few incremental experiments, we finally built a framework that checked all boxes. The implementation used Java, TestNG, Cucumber along with a few additional custom utilities.

An Android or an iOS app interacts with a backend service over http with JSON payload. For instance, on logging in, one might receive three values: Name, Age and Balance.

Consider the following JSON response:

When we test backend services, we test the responses that it returns, based on schemas and certain guidelines (API contracts). Schema for this JSON would look like:

Understanding the JSON schema:

  • The ‘required’ attribute indicates the properties/objects/arrays that are a must have in the response.
  • The ‘additional properties’ attribute tells us if additional keys/objects are acceptable. If this attribute is set to ‘false’, the schema check will not allow any additional keys in the response.
  • The ‘type’ attribute indicates the data type of the key in JSON. In the example above, ‘Name’ is declared as “string”, ‘Age’ as “integer” and ‘Balance’ as “number”. This means that if the response is:
_{ "Name": 123, "Age": 28.1, "Balance": "144.25"}_

all schema checks will fail since the name in response is not a string, age is not an integer and balance is a string.

Apart from schema/contract checks, we also have a lot of functional and database level validations as a part of the test suites.

To understand how this is orchestrated in our system, from wire-framing a new feature/enhancement to getting the tests to make it a part of regression suite, we have taken up the following process:

If the API signature is unknown, a pending exception will be triggered. Thus, whenever a test is run, there will be a pending result as there is not adequate information to fill the criteria. But they can still prepare the basic framework of the test.

Since there is a large influx of interaction with the backend, Test NG runner runs the tests in parallel for maximum output. While the test is running, it keeps shipping logs which the developer can refer to in case the test fails. These are auxiliary elements which can be used to deep dive into the issues.

When the test is running, all data is recorded. Once the test run is complete, there will be 3 reports; in Jenkins, on slack and one on a dash-boarding tool called Grafana. Jenkins is where the tests run, so by default, it gets added there. Slack is to ensure that whoever the specific stakeholder is, they do not have to go back to Jenkins to know the details. They can simply get an alert on slack with the link to the test results. Grafana is where the trends are shown. These three sources can be used to analyse the results.

If all test cases pass, then the feature is ready to be shipped and is taken live. If there is one aspect of the test that is failing, then the developer has to identify a fix, deploy it and run that Jenkins build again.

Jenkins pipeline job used for test automation uses groovy script with multiple stages. Each stage in the pipeline script is used for performing certain tasks which is required pre/post/during test execution.

The steps involved in triggering Jenkins Build eventually lead to test run management. The machine on which the test runs has a log archival logic. So, it will archive logs for, say, n-20 runs. Supposedly, if the 1000th test is running, then it will archive everything that is beyond 980

How does a developer debug a failing test?

A failing test can be debugged using one of the 2 approaches:

  1. Viewing cucumber reports in Jenkins: Reports are embedded within each Jenkins job with details of steps that failed along with API details (endpoints, params, body, headers, etc)
  2. Viewing real time logs in Grafana using Loki: Logs can be accessed within Grafana using Loki. The logs provide details like:
  • Curl request for failing tests
  • Response received for failing steps
  • Any additional info, debug, and error level logs in the test scripts.

Test Results:

Test results are attached within Jenkins job for each complete run for all behaviours using default cucumber reports. Below are a few samples:

The output section within each behaviour can be expanded to view details for requests and responses.

Trends for test runs are logged in InfluxDB for each test run and visualised using dashboards on Grafana. Furthermore, various rule-based alerts are configured using Grafana that publish test results and trends on Slack.

Wrapping it up:

It takes a great deal of effort and mechanism to put everything in place for users to seamlessly navigate the Dream11 application with smooth functioning backend automation. The failure and success of tests restores our faith in a tactful solution which caters to the customers and developers alike.

The framework has helped the team to overcome all the challenges pertaining to environments and test data dependency. It has also enabled the teams to test the system efficiently and provide faster release sign offs with more confidence.

If you are interested in solving complex problems, we are looking to hire talented SDET & Performance Engineers! Apply Now.

Authored by : SDET team @ Dream11

Related Blogs

Experimentation at Dream11: Chapter I - Intelligent Traffic Routing (ITR)
At Dream11, experimentation runs deep in our DNA. We believe in building a culture that gives Dreamsters opportunities to experiment, fail and learn. In this blog, we deep dive into our journey to optimise payment success rates. We’ll touch upon the challenges we faced, the solutions deployed and the mathematical underpinnings behind our approach. Join us as we delve into the world of non-stationary bandits and how they've helped us maximise the success rate of payment routing.
December 13, 2023
Unleashing Efficiency: The Power of Mobile App Automation at Dream11
People love high-quality apps. But imagine running an app seamlessly across millions of mobile devices, ranging from a simple Android phone supported by a 2G or 4G network, or the latest iOS device.
June 23, 2023
Player Pricing
With Dream11 hosting around 10,000+ matches every year on its platform, have you ever wondered what all goes behind hosting these matches? It starts from deciding the match to host, generating the credit of players (keeping the user’s perspective in mind) and taking the match live - the whole picture is much bigger. In this blog we will take you through the whole process of assigning credit to players for individual matches, what data goes behind it, what were the considerations for automating the process and how through data driven intelligence this automation was achieved. We will also discuss the benefits of doing this automation from operational and business POV.
June 21, 2023