-
ActionsRemaker: Reproducing GitHub Actions
Hao-Nan Zhu,
Kevin Z. Guan,
Robert M. Furth,
Cindy Rubio-González
Abstract: Mining Continuous Integration and Continuous Delivery (CI/CD) has enabled new
research opportunities for the software engineering (SE) research community. However, it remains
a challenge to reproduce CI/CD build processes, which is crucial for several areas of research
within SE such as fault localization and repair. In this paper, we present ActionsRemaker, a
reproducer for GitHub Actions builds. We describe the challenges on reproducing GitHub Actions
builds and the design of ActionsRemaker. Evaluation of ActionsRemaker demonstrates its ability
to reproduce fail-pass pairs: of 180 pairs from 67 repositories, 130 (72.2%) from 43
repositories are reproducible. We also discuss reasons for unreproducibility. ActionsRemaker is
publicly available at
https://github.com/bugswarm/actions-remaker,
and a demo of the tool can be found at
https://youtu.be/flblSqoxeA.
-
On the Reproducibility of Software Defect Datasets
Hao-Nan Zhu,
Cindy Rubio-González
Abstract: Software defect datasets are crucial to facilitating the evaluation and
comparison of techniques in fields such as fault localization, test generation, and
automated program repair. However, the reproducibility of software defect artifacts is not
immune to breakage. In this paper, we conduct a study on the reproducibility of software
defect artifacts. First, we study five state-of-the-art Java defect datasets. Despite the
multiple strategies applied by dataset maintainers to ensure reproducibility, all datasets
are prone to breakages. Second, we conduct a case study in which we systematically test the
reproducibility of 1,795 software artifacts during a 13-month period. We find that 62.6% of
the artifacts break at least once, and 15.3% artifacts break multiple times. We manually
investigate the root causes of breakages and handcraft 10 patches, which are automatically
applied to 1,055 distinct artifacts in 2,948 fixes. Based on the nature of the root causes,
we propose automated dependency caching and artifact isolation to prevent further breakage.
In particular, we show that isolating artifacts to eliminate external dependencies increases
reproducibility to 95% or higher, which is on par with the level of reproducibility
exhibited by the most reliable manually curated dataset.
-
On the Real-World Effectiveness of Static Bug Detectors at Finding Null Pointer Exceptions
David Tomassi,
Cindy Rubio-González
In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE)
Abstract: Static bug detectors aim at helping developers to automatically find and
prevent bugs. In this experience paper, we study the effectiveness of static bug detectors at
identifying Null Pointer Dereferences or Null Pointer Exceptions (NPEs). NPEs pervade all
programming domains from systems to web development. Specifically, our study measures the
effectiveness of five Java static bug detectors: CheckerFramework, ERADICATE, INFER, NULLAWAY,
and SPOTBUGS. We conduct our study on 102 real-world and reproducible NPEs from 42 open-source
projects found in the BUGSWARM and DEFECTS4J datasets. We apply two known methods to determine
whether a bug is found by a given tool, and introduce two new methods that leverage stack trace
and code coverage information. Additionally, we provide a categorization of the tool’s
capabilities and the bug characteristics to better understand the strengths and weaknesses of
the tools. Overall, the tools under study only find 30 out of 102 bugs (29.4%), with the
majority found by ERADICATE. Based on our observations, we identify and discuss opportunities to
make the tools more effective and useful.
-
Fixing Dependency Errors for Python Build Reproducibility
Suchita Mukherjee,
Abigail Almanza,
Cindy Rubio-González
In ISSTA 2021: Proceedings of the 30th ACM SIGSOFT International Symposium on Software
Testing and Analysis
Abstract: Software reproducibility is important for re-usability and the cumulative
progress of research. An important manifestation of unreproducible software is the changed
outcome of software builds over time. While enhancing code reuse, the use of open-source
dependency packages hosted on centralized repositories such as PyPI can have adverse effects
on build reproducibility. Frequent updates to these packages often cause their latest
versions to have breaking changes for applications using them. Large Python applications
risk their historical builds becoming unreproducible due to the widespread usage of Python
dependencies, and the lack of uniform practices for dependency version specification.
Manually fixing dependency errors requires expensive developer time and effort, while
automated approaches face challenges of parsing unstructured build logs, finding transitive
dependencies, and exploring an exponential search space of dependency versions. In this
paper, we investigate how open-source Python projects specify dependency versions, and how
their reproducibility is impacted by dependency packages. We propose a tool PyDFix to detect
and fix unreproducibility in Python builds caused by dependency errors. PyDFix is evaluated
on two bug datasets BugSwarm and BugsInPy, both of which are built from real-world
open-source projects. PyDFix analyzes a total of 2,702 builds, identifying 1,921 (71.1%) of
them to be unreproducible due to dependency errors. From these, PyDFix provides a complete
fix for 859 (44.7%) builds, and partial fixes for an additional 632 (32.9%) builds.
-
A Note About: Critical Review of BugSwarm for Fault Localization and Program Repair
David A. Tomassi,
Cindy Rubio-González
Available as PDF and BibTeX which can be found:
Here
Abstract: Datasets play an important role in the advancement of software tools and
facilitate their evaluation. BugSwarm is an infrastructure to automatically create a large
dataset of real-world reproducible failures and fixes. In this paper, we respond to Durieux
and Abreu's critical review of the BugSwarm dataset, referred to in this paper as
CriticalReview. We replicate CriticalReview's study and find several incorrect claims and
assumptions about the BugSwarm dataset. We discuss these incorrect claims and other
contributions listed by CriticalReview. Finally, we discuss general misconceptions about
BugSwarm, and our vision for the use of the infrastructure and dataset.
-
Bugs in the Wild: Examining the Effectiveness of Static Analyzers at Finding Real-World Bugs
David A. Tomassi
In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference
and Symposium on the Foundations of Software Engineering
Available as PDF and BibTeX which can be found:
Here
Abstract: Static analysis is a powerful technique to find software bugs. In past
years, a few static analysis tools have become available for developers to find certain
kinds of bugs in their programs. However, there is no evidence on how effective the tools
are in finding bugs in real-world software. In this paper, we present a preliminary study on
the popular static analyzers ErrorProne and SpotBugs. Specifically, we consider 320 real
Java bugs from the BugSwarm dataset, and determine which of these bugs can potentially be
found by the analyzers, and how many are indeed detected. We find that 30.3% and 40.3% of
the bugs are candidates for detection by ErrorProne and SpotBugs, respectively. Our
evaluation shows that the analyzers are relatively easy to incorporate into the tool chain
of diverse projects that use the Maven build system. However, the analyzers are not as
effective detecting the bugs under study, with only one bug successfully detected by
SpotBugs.