BugSwarm aims to provide durably-reproducible software defect artifacts. Each artifact consists of a Docker container that contains a buggy version and a fixed version of a project. An artifact is reproducible if the original software defect can be replicated whenever its build script and tests are run. We determine the reproducibility of a software defect with the following steps:
- We build the buggy version of the project, run the tests and gather the resulting reproduced log.
- We build the fixed version of the project and repeat the process of testing and collecting reproduced logs.
- We compare the reproduced logs with historical original logs for both buggy and fixed versions. If the number and names of failing tests, and the CI build outcome (passing, failing, or error) match the original log, then the artifact is deemed reproducible; otherwise, it is unreproducible.
We repeat these steps at least 3 times for each artifact. If an artifact is successfully reproduced each time, we deem it reproducible. If it is successfully reproduced some but not all of the time, we categorize it as flaky. Otherwise, we deem the artifact unreproducible. We report the reproducibility of BugSwarm as the ratio of reproducible and flaky artifacts to all artifacts.