Flaky Testing: Find, Fix and Prevent
I recently attended a presentation about flaky testing. There was shared what they are, how you can find them, solve them and prevent them. In this blog post I talk about the most important lessons from that presentation.
What are Flaky Tests?
Flaky tests are tests that are inconsistent in their results. They pass or fail without any changes being made to the test code or the application. This behavior leads to significant productivity losses from investigating root causes and redoing failed builds. Additionally, it reduces confidence in the tests, which can result in the delivery of worse software.
Causes of Flaky Testing
There are several causes of flaky testing, including:
- Timing issues: Problems with the timing of the application, server or network.
- Testdata issues: The test data is different than expected, for example due to the use of random data.
- Database changes: Adjustments to the database may cause unexpected results.
- Dependencies between test cases: When a test case is dependent on another test case that has adjusted the data or has not prepared it properly.
- Infrastructure problems: Instability, timeouts, heavy server load or competing processes can cause problems.
- System resources: Files or other resources used by multiple processes simultaneously.
Preventing Flaky Testing
Prevention is better than cure. Here are some strategies to avoid flaky testing:
- Run new tests repeatedly: Run a new test multiple times (e.g. 50 times) before adding it to the test suite.
- Manage test data: Prepare and clean up test data per test case. Use APIs to prepare data instead of via the UI.
- Use of test containers: Prepare situations using test containers.
- Independent test cases: Make sure test cases are independent of each other.
- Do not reuse system resources: Avoid reusing system resources such as system properties, files, and ports.
- Apply test pyramid: Move tests down the test pyramid by making them smaller, simpler and at a lower level.
- Simplify UI tests: Use deep links instead of navigation and ensure users are logged in via cookies.
Detecting Flaky Testing
To troubleshoot flaky tests, you first need to be able to detect them:
- Keeping track of test results in CI: Track how tests are doing in the Continuous Integration build.
- Running tests in parallel: Run tests in parallel to identify inconsistent behavior faster.
- Avoid retries: Don’t build retries into your tests so you can detect flaky tests faster.
Solving Flaky Tests
Once you have identified a flaky test, it is important to address it immediately:
- Quarantine: Quarantine flaky tests as soon as they fail. Don’t let them participate in the test suite anymore.
- Documentation and recovery: Provide clear documentation for the test and work to restore it as quickly as possible.
- Single tests in IDE: Make sure you can run a separate test in your IDE to analyze the problem faster.
- Involve the team: Involve the entire team in the status of the tests. For example, every developer can solve a flaky test per sprint.
- Delete test: If a test is beyond repair, consider whether it is still needed and delete it if necessary.
- Rewrite the test: Rewrite the test at a different level or place.
- Improve team culture: Build a culture where flaky tests are actively tracked and resolved, and make a business case for resolving these tests.
Conclusion
Flaky testing poses a major problem to the productivity and reliability of software development. Through a combination of preventative measures and effective detection and resolution, you can significantly reduce the impact of flaky testing and improve the quality of your software.