Using Awaitility with Cucumber for Eventual Consistency checks

Using Awaitility with Cucumber for Eventual Consistency checks

The last part of the guide focuses on building end-to-end tests with Cucumber that support eventual consistency. We use the second feature, the Leaderboard, to show how to integrate Awaitility in Cucumber tests with a practical example.

> The Cucumber Java Guide
GitHub
Learn how to build end-to-end tests with Java and Cucumber, using this step-by-step tutorial with practice code examples. This guide is part of the book's extra chapters.
Part 4. Eventual Consistency with Cucumber and Awaitility (this article)
Part 4 - Table of Contents

Eventual Consistency and Cucumber Tests

Strong Consistency in Tests

The Cucumber Step Definitions that we wrote in Part 3 assumed strong consistency in the backend’s data. Right after the users send their attempts to solve the multiplication challenges, we retrieve the stats to verify them, and we check how they include the last attempt. See Listing 1 with the Gherkin snippet showing the relevant part of this test.

When he sends the correct challenge solution
Then his stats include 1 correct attempt

Listing 1. A fragment of the Solving Challenges feature

Listing 2 shows the Java implementation of the step definition for verifying the statistics.

@Then("her/his stats include {int} {correct} attempt(s)")
public void statsIncludeAttempts(int attemptNumber, boolean correct) throws Exception {
    var stats = this.challengeActor.retrieveStats();
    assertThat(stats)
            .filteredOn("correct", true)
            .hasSize(attemptNumber);
}

Listing 2. Verifying that the statistics include the expected attempts

Remember: All the code in this post is available on GitHub: Book - Cucumber Tests. If you find it useful, please give it a star!

This is very intuitive and straightforward because systems using strong consistency are easy to test. However, the practical use case that we built in the book is not strongly consistent (and we dedicated a complete chapter to analyze why it isn’t).

Eventual Consistency Challenges with Cucumber

Our backend architecture is using microservices and eventual consistency. See Figure 1. If you read the book, this figure should be familiar for you.

Figure 1. Eventual Consistency in our system

When a user sends an attempt to the backend (1), that attempt is checked and stored in the database before returning a response (3). Therefore, when we later ask for the statistics, the last attempt is included there. These two operations are under the scope of the Multiplication microservice.

In the book, we explain how we achieve loose coupling by using an event-driven approach: instead of making the challenge domain aware of the gamification domain, the first domain triggers an event (2) via a message broker (RabbitMQ) when an attempt is processed. The gamification logic uses data from the event to calculate the score and badges of the users, but this operation (4) may happen after the response to the challenge has been sent (3).

If we don’t embrace eventual consistency in our tests, we might end up building unstable tests that sometimes pass and sometimes fail. Let’s look again at a part of our second feature definition in Gherkin. See Listing 3.

Given the following solved challenges
  | user  | solved_challenges |
  | Karen | 5                 |
  | Laura | 7                 |
Then Karen has 50 points
* Karen has the "First time" badge

Listing 3. A fragment of the Leaderboard feature

To retrieve the leaderboard, we’ll use the REST API exposed by the Gamification microservice, although this is abstracted from us thanks to the Gateway pattern. If you don’t remember the complete backend architecture, have a look again at Figure 2 in Part 2 of this guide. From the leaderboard data, we’ll extract the score and the badges.

What could happen is that the Then part of our test script in Listing 3 calls the Gamification API before this microservice has received and processed all the RabbitMQ messages that carry those attempts (as shown in Figure 1). Karen might have 0 points, or 20, or maybe 50 if we’re lucky. Nobody knows because it depends on the environments where the tests and the system are running.

For a better understanding of the problem, we’ll demonstrate it first. Then, we’ll go through the alternatives and the best practices you can use to deal with eventual consistency in Cucumber tests.

Learn Microservices with Spring Boot - Second Edition

The GameStepDefinitions class

We already included in our project the Leaderboard feature script when we described Gherkin features, but let’s have a second look at it. See Listing 4.

Feature: The Leaderboard shows a ranking with all the users who solved
  challenges correctly. It displays them ordered by the highest score first.

  Scenario: Users get points and badges when solving challenges, and they
  are positioned accordingly in the Leaderboard.
    Given the following solved challenges
      | user  | solved_challenges |
      | Karen | 5                 |
      | Laura | 7                 |
    Then Karen has 50 points
    * Karen has the "First time" badge
    And Laura has 70 points
    * Laura has the "First time" badge
    * Laura has the "Bronze" badge
    And Laura is above Karen in the ranking

Listing 4. The leaderboard.feature Gherkin file

Now that we have experience with Step definition files and we prepared the Leaderboard actor class, we can create a first version of the GameStepDefinitions class. See Listing 5.

package microservices.book.cucumber.steps;

import java.util.HashMap;
import java.util.Map;
import java.util.Optional;

import io.cucumber.datatable.DataTable;
import io.cucumber.java.en.Given;
import io.cucumber.java.en.Then;

import microservices.book.cucumber.actors.Challenge;
import microservices.book.cucumber.actors.Leaderboard;
import microservices.book.cucumber.api.dtos.leaderboard.LeaderboardRowDTO;

import static org.assertj.core.api.Assertions.*;

public class GameStepDefinitions {

    private Map<String, Challenge> userActors;
    private final Leaderboard leaderboardActor;

    public GameStepDefinitions() {
        this.leaderboardActor = new Leaderboard();
    }

    @Given("the following solved challenges")
    public void theFollowingSolvedChallenges(DataTable dataTable) throws Exception {
        processSolvedChallenges(dataTable);
    }

    private void processSolvedChallenges(DataTable userToSolvedChallenges) throws Exception {
        userActors = new HashMap<>();
        for (var userToSolved : userToSolvedChallenges.asMaps()) {
            var user = new Challenge(userToSolved.get("user"));
            user.askForChallenge();
            int solved = Integer.parseInt(userToSolved.get("solved_challenges"));
            for (int i = 0; i < solved; i++) {
                user.solveChallenge(true);
            }
            userActors.put(user.getOriginalName(), user);
        }
    }

    @Then("{word} has {int} points")
    public void userHasPoints(String user, long score) throws Exception {
        Optional<LeaderboardRowDTO> optionalRow = this.leaderboardActor
                .update()
                .getByUserId(userActors.get(user).getUserId());
        assertThat(optionalRow).isPresent()
                .map(LeaderboardRowDTO::getTotalScore).hasValue(score);
    }

    @Then("{word} has the {string} badge")
    public void userHasBadge(String user, String badge) throws Exception {
        Optional<LeaderboardRowDTO> optionalRow = this.leaderboardActor
                .update()
                .getByUserId(userActors.get(user).getUserId());
        assertThat(optionalRow).isPresent();
        assertThat(optionalRow.get().getBadges()).contains(badge);
    }

    @Then("{word} is above {word} in the ranking")
    public void userIsAboveUser(String userAbove, String userBelow) throws Exception {
        var updatedLeaderboard = this.leaderboardActor.update();
        int positionAbove = updatedLeaderboard.whatPosition(
                userActors.get(userAbove).getUserId()
        );
        int positionBelow = updatedLeaderboard.whatPosition(
                userActors.get(userBelow).getUserId()
        );
        assertThat(positionAbove).isLessThan(positionBelow);
    }

}

Listing 5. A first version of the GameStepDefinitions class

Remember: All the code in this post is available on GitHub: Book - Cucumber Tests. If you find it useful, please give it a star!

This code follows a similar approach to our previous step implementations. We use the leaderboardActor instance to keep the state between steps (as introduced in Part 3). Besides, we need to simulate several users sending attempts to the system, and for that, we leverage the UserActor class that we already created. It’s now when we really see the advantages of the actor abstraction layer: we don’t need to replicate in this step definition class all the state variables and interactions.

Except for the first step definition, which uses a Cucumber Datatable, all the other ones are simple:

  • The userHasPoints method takes care of verifying if a given user in the leaderboard has the expected score. It updates the leaderboard and it tries to find the user. If it’s there, it’ll compare the real score with the expected value. Only if it matches, this step will pass.
  • The userHasBadge method does the same but for expected badges. Note that it’s the first time we use the Cucumber’s parameter type {string}, because badges may consist of several words. For the same reason, we enclosed the badge name within quotes in Gherkin (See Listing 4, e.g. “First time”).
  • userIsAboveUser takes two user names as parameters and verifies that one is above the other in the ranking. Keep in mind that we can’t verify absolute positions because we create random users in other test scenarios, so we never know what other users are already in the ranking. Therefore, we use a relative comparison.

Let’s cover Datatables in a separate section.

Cucumber’s Datatables in practice

In Cucumber, we can pass data structures to our tests. This is very convenient because we avoid step repetition in Gherkin, which improves readability. For our test cases, we can quickly define preconditions for multiple users in a more visual way. See Listing 6, an extract of the Leaderboard feature.

    Given the following solved challenges
      | user  | solved_challenges |
      | Karen | 5                 |
      | Laura | 7                 |

Listing 6. A Datatable in Gherkin

As you can imagine, we could include more rows in other test cases. Since we’re testing the ranking, this syntax helps us visualize that Laura should actually be in a higher ranking position than Karen.

In the code, we read the Datatable just by declaring it as a method argument. We don’t need to specify anything extra.

The Datatable class offers multiple methods to read the structure in multiple different ways:

  • As a list of lists: a list of rows with each row having multiple values.
  • As a map: if the table has two columns, the first will be the key, and the second the value.
  • As a list of maps: each item in the list is a single-entry map whose keys are the table headings and values are the entries in that particular row. In my opinion, this is the most intuitive way of reading the Datatable because it uses table headings.
  • etc. Check the official docs if you’re curious.

Listing 7 (below) is an extract of the complete code of the GameStepDefinitions class (included in Listing 5). In this fragment, we see how we loop through all the rows returned by the .asMaps() method and we use the table headings to get the username, and the number of solved challenges (using Map.get() for each usertoSolved row). To achieve our goal, we send as many attempts as specified for the simulated user via the actor class (user.solveChallenge()).

private void processSolvedChallenges(DataTable userToSolvedChallenges) throws Exception {
    userActors = new HashMap<>();
    for (var userToSolved : userToSolvedChallenges.asMaps()) {
        var user = new Challenge(userToSolved.get("user"));
        user.askForChallenge();
        int solved = Integer.parseInt(userToSolved.get("solved_challenges"));
        for (int i = 0; i < solved; i++) {
            user.solveChallenge(true);
        }
        userActors.put(user.getOriginalName(), user);
    }
}

Listing 7. Processing data in a Cucumber’s Datatable

Learn Microservices with Spring Boot - Second Edition

Running the tests

After we implemented the missing step definitions, we can now run the tests again and check the results. We didn’t prepare anything to support eventual consistency yet, so we expect these tests to be unstable.

Before running the tests, you must start the backend system. As we introduced in Part 3, the easiest way to do that is to download the docker-compose-public.yml file from the book repositories, and then execute it using Docker Compose.

$ docker-compose -f docker-compose-public.yml up

After all the services start, you can run the test suite with:

$ ./mvnw clean test

If you’re lucky, all the tests will pass. The Leaderboard feature scenario will be marked as green, and Maven gives you a result of zero failures. You may try multiple times and get the same result: everything passes.

What happens here? Did I make up the eventual consistency challenge? Not really. It means the backend system is fast enough to produce, send, consume, and process all the RabbitMQ messages before the test calls the leaderboard API. If you look back at the previous Figure 1, it means that (3 - send message) and (4 - process event) are completed before the API call. The results of the tests depend a lot on the environment where you’re running them, so even when you get all of them passing, you shouldn’t relax and think that they’re stable enough. Do not follow the “It works on my machine” motto. Tomorrow, a colleague could experience errors when running them on a different computer, or maybe your CI/CD system has fewer resources. Then, you’ll see the errors.

Forcing flaky tests due to Eventual Consistency

Let’s force the error situation to better learn how to solve it. What we’ll do is to limit the resources of the Gamification microservice. If we make its container work with a limited CPU, we expect it to be slower when processing the messages.

Edit the docker-compose-public.yml file and add the deploy YAML block shown in Listing 8. We’ll limit the CPU of gamification to 0.2, which means 20% of one of the CPU cores in your machine.

  gamification:
    image: learnmicro/gamification:0.0.1
    environment:
      - SPRING_PROFILES_ACTIVE=docker
      - SPRING_CLOUD_CONSUL_HOST=consul
    deploy:
      resources:
        limits:
          cpus: '0.20'
        reservations:
          cpus: '0.10'
    depends_on:
      - rabbitmq-dev
      - consul-importer
    networks:
      - microservices

Listing 8. Limiting resources of the Gamification microservice

To make this work with Docker Compose without enabling the swarm mode, we have to run this time the system with the --compatibility flag. Remember to bring the system down first if you’re still running it from the previous execution.

$ docker-compose -f docker-compose-limits.yml --compatibility up

This time, it may take a while until the Gamification microservice is fully ready. Check the logs in the output of the docker-compose command to wait until you see that the service is ready. In my case, it took around three minutes:

gamification_1     | 2020-10-09 05:30:48.084  INFO [,,,] 1 --- [           main] m.b.g.GamificationApplication            : Started GamificationApplication in 183.403 seconds (JVM running for 190.749)

Now, run the tests again to see if we can reproduce the errors due to eventual consistency. In my case, limiting the CPU of gamification to 0.2 works, and I get sometimes test failures due to Karen not being present in the ranking, or she having a lower score than expected. See an example of the results in Figure 2, where Karen has 30 points instead of the 50 expected.

Figure 2. Cucumber tests failing due to eventual consistency

Why do we get different results each time? Because sometimes the API call is processed by Gamification before it consumed the messages, or in between processing them, or after.

We have reproduced the problem. How do we fix this? Let’s see the alternatives.

Note: if you still get passing tests, try to limit the CPU resources even more.

The Thread.sleep() approach in Cucumber

A simple conclusion we may extract from our experiment is that the tests we made are too fast. We send the challenges and, very quickly after that, we retrieve the leaderboard.

Sometimes, people get convinced that this will never happen in a real use case - a real person interacting with the system. In our practical case study, getting the wrong leaderboard data is irrelevant because it gets updated periodically. The user will eventually see the right score and badges.

Therefore, on many occasions, developers introduce delays in the tests to give some extra time to the eventually-consistent system to become consistent. See Listing 9.

@Then("{word} has {int} points")
public void userHasPoints(String user, long score) throws Exception {
    Thread.sleep(5000);
    Optional<LeaderboardRowDTO> optionalRow = this.leaderboardActor
            .update()
            .getByUserId(userActors.get(user).getUserId());
    assertThat(optionalRow).isPresent()
            .map(LeaderboardRowDTO::getTotalScore).hasValue(score);
}

Listing 9. Adding time guards to tests to deal with eventual consistency

When we wait those five extra seconds before retrieving the leaderboard, the test scenario passes again. However, this is a bad practice due to multiple reasons:

  • It’s not efficient. You are wasting time in your tests because you usually want to wait as much as the slowest machine in your organization (or cloud system) needs to run the tests in a stable manner.
  • It does not solve the problem. Tomorrow a new, slower machine in your organization may require more than five seconds to see stable results. Even worse, all other systems have to wait for the new longer time guard.
  • It makes your tests more difficult to maintain. In our example, we wait every time we check the score. However, we don’t need to wait the second time. So, we might end up with really tricky flows here if we try to optimize it, which may even depend on how we wrote the Gherkin scenarios.

Luckily, there is a better way to introduce these time guards and keep the tests readable and efficient: using a polling library like Awaitility.

A practical example of Awaitility and Cucumber

Awaitility is a simple Java library to test asynchronous systems. In a nutshell, it retries calls to a “condition function” until the condition is fulfilled or a timeout expires. It does it in a readable way, so our test definition steps are still easy to understand.

Learn Microservices with Spring Boot - Second Edition

In our case, we can use Awaitility to poll the backend system until the assertion for the expected result passes. We’ll also define a maximum polling period of 5 seconds, but we could easily modify it when needed. See Listing 10 for the new implementation of the userHasPoints method using Awaitility.

@Then("{word} has {int} points")
public void userHasPoints(String user, long score) {
    await().atMost(5, TimeUnit.SECONDS).untilAsserted(
            () -> {
                Optional<LeaderboardRowDTO> optionalRow = this.leaderboardActor
                        .update()
                        .getByUserId(userActors.get(user).getUserId());
                assertThat(optionalRow).isPresent()
                        .map(LeaderboardRowDTO::getTotalScore).hasValue(score);
            }
    );
}

Listing 10. Adding time guards to tests to deal with eventual consistency

Remember: All the code in this post is available on GitHub: Book - Cucumber Tests. If you find it useful, please give it a star!

To use the library options provided by Awaitility, we call its static method await(). There are multiple options and functions that you can use, as described in its Usage documentation. In our example, we configure it to poll for a maximum of 5 seconds with atMost(). We pass a function as a lambda via untilAsserted(), which we can use to instruct Awaitility to wait until the function’s assertion passes. This is a nice way to combine Awaitility with AssertJ. If you don’t want to use AssertJ, you could also use the generic until() method, which expects a Callable<Boolean> function or lambda that will be called until it returns true. It also accepts Hamcrest matchers, check the docs for more details.

We can modify all the steps in GameStepDefinitions that depend on eventual consistency to include Awaitility to check until the expected conditions are asserted. See Listing 11 for the final version of the step definitions class.

package microservices.book.cucumber.steps;

import java.util.HashMap;
import java.util.Map;
import java.util.Optional;
import java.util.concurrent.TimeUnit;

import io.cucumber.datatable.DataTable;
import io.cucumber.java.en.Given;
import io.cucumber.java.en.Then;

import microservices.book.cucumber.actors.Leaderboard;
import microservices.book.cucumber.actors.Challenge;
import microservices.book.cucumber.api.dtos.leaderboard.LeaderboardRowDTO;

import static org.assertj.core.api.Assertions.*;
import static org.awaitility.Awaitility.*;

public class GameStepDefinitions {

    private Map<String, Challenge> userActors;
    private final Leaderboard leaderboardActor;

    public GameStepDefinitions() {
        this.leaderboardActor = new Leaderboard();
    }

    @Given("the following solved challenges")
    public void theFollowingSolvedChallenges(DataTable dataTable) throws Exception {
        processSolvedChallenges(dataTable);
    }

    private void processSolvedChallenges(DataTable userToSolvedChallenges) throws Exception {
        userActors = new HashMap<>();
        for (var userToSolved : userToSolvedChallenges.asMaps()) {
            var user = new Challenge(userToSolved.get("user"));
            user.askForChallenge();
            int solved = Integer.parseInt(userToSolved.get("solved_challenges"));
            for (int i = 0; i < solved; i++) {
                user.solveChallenge(true);
            }
            userActors.put(user.getOriginalName(), user);
        }
    }

    @Then("{word} has {int} points")
    public void userHasPoints(String user, long score) {
        await().atMost(5, TimeUnit.SECONDS).untilAsserted(
                () -> {
                    Optional<LeaderboardRowDTO> optionalRow = this.leaderboardActor
                            .update()
                            .getByUserId(userActors.get(user).getUserId());
                    assertThat(optionalRow).isPresent()
                            .map(LeaderboardRowDTO::getTotalScore).hasValue(score);
                }
        );
    }

    @Then("{word} has the {string} badge")
    public void userHasBadge(String user, String badge) {
        await().atMost(5, TimeUnit.SECONDS).untilAsserted(
                () -> {
                    Optional<LeaderboardRowDTO> optionalRow = this.leaderboardActor
                            .update()
                            .getByUserId(userActors.get(user).getUserId());
                    assertThat(optionalRow).isPresent();
                    assertThat(optionalRow.get().getBadges()).contains(badge);
                }
        );
    }

    @Then("{word} is above {word} in the ranking")
    public void userIsAboveUser(String userAbove, String userBelow) {
        await().atMost(5, TimeUnit.SECONDS).untilAsserted(
                () -> {
                    var updatedLeaderboard = this.leaderboardActor.update();
                    int positionAbove = updatedLeaderboard.whatPosition(
                            userActors.get(userAbove).getUserId()
                    );
                    int positionBelow = updatedLeaderboard.whatPosition(
                            userActors.get(userBelow).getUserId()
                    );
                    assertThat(positionAbove).isLessThan(positionBelow);
                }
        );
    }

}

Listing 11. Adding Awaitility to Cucumber steps to deal with eventual consistency

Finally, we made our tests stable no matter in what environment is running just by adding a few lines of code.

This polling approach with libraries like Awaitility is a better way of checking results in an eventually consistent system. Try always to favor this technique over going forward without supporting eventual consistency or using fixed periods.

Conclusions and Achievements

We reached the end of the guide! We went through the specification, design, and implementation of a suite of Cucumber tests for an eventually-consistent system.

  • You learned the principles of BDD and how it enables better communication within people, so you can build bridges between business people and development teams (Part 1).
  • You know the basics about the Gherkin syntax and its Given-When-Then structure (Part 1).
  • You saw how Cucumber expressions help map the Gherkin’s scenario steps to Java code, by using Cucumber’s built-in parameters and custom ones (Part 1).
  • You understood the main components of a Cucumber’s Java project: the feature files, the step definition classes, and the JUnit’s entrypoint (Part 1).
  • You created a Cucumber project from scratch using a real-life practical example of the system under test (Part 2).
  • You saw some best practices for defining Gherkin steps that can be reused while keeping good test readability (Part 2).
  • You learned how to structure the Cucumber project in Java, and the right level of abstraction for the best reusability and maintainability (Part 2).
  • You implemented an API client in plain Java to interact with the backend’s REST API (Part 2).
  • You created Actor classes to keep the state between steps, that you can reuse across features (Part 3).
  • You created some real examples of Cucumber expression mapping to Java methods (Part 3).
  • You know how to run Cucumber steps and publish reports online (Part 3).
  • You saw how to use Cucumber’s Datatables (Part 4 - this post).
  • You understood the challenges of testing eventually-consistent systems with a practical use case (Part 4 - this post).
  • You learned how to reproduce flakiness in tests by reducing the resources of one of the backend’s microservices (Part 4 - this post).
  • You used Awaitility to introduce polling from the tests to support eventual consistency (Part 4 - this post).

Subscribe to the newsletter if you want to be notified with new guides are published. If you don’t have the book yet, you can purchase a copy from Apress and many other online stores.

Learn Microservices with Spring Boot - Second Edition
This article is part of a guide:
> The Cucumber Java Guide
GitHub
Learn how to build end-to-end tests with Java and Cucumber, using this step-by-step tutorial with practice code examples. This guide is part of the book's extra chapters.
Part 4. Eventual Consistency with Cucumber and Awaitility (this article)
Moisés Macero's Picture

About Moisés Macero

Software Developer, Architect, and Author.
Are you interested in my workshops?

Amsterdam, The Netherlands https://thepracticaldeveloper.com

Comments