Those papers are under review and will continue evolving in the future, any feedback will be greatly appreciated
An extreme transformation removes the body of a method that is reached by one
test case at least. If the test suite passes on the original program and still
passes after the extreme transformation, the transformation is said to be
undetected, and the test suite needs to be improved. In this work we propose a
technique to automatically determine which of the following three reasons
prevent the detection of the extreme transformation is : the test inputs are
not sufficient to infect the state of the program; the infection does not
propagate to the test cases; the test cases have a weak oracle that does not
observe the infection. We have developed Reneri, a tool that observes the
program under test and the test suite in order to determine runtime differences
between test runs on the original and the transformed method. The observations
gathered during the analysis are processed by Reneri to suggest possible
improvements to the developers. We evaluate Reneri on 15 projects and a total
of 312 undetected extreme transformations. The tool is able to generate a
suggestion for each each undetected transformation. For 63% of the cases, the
existing test cases can infect the program state, meaning that undetected
transformations are mostly due to observability and weak oracle issues.
Interviews with developers confirm the relevance of the suggested improvements
and experiments with state of the art automatic test generation tools indicate
that no tool can improve the existing test suites to fix all undetected
Neutral program variants are functionally similar to an original program, yet implement slightly different behaviors. Techniques such as approximate computing or genetic improvement share the intuition that potential for enhancements lies in these acceptable behavioral differences (e.g., enhanced performance or reliability). Yet, the automatic synthesis of neutral program variants, through speculative transformations remains a key challenge.
This work aims at characterizing plastic code regions in Java programs, i.e., the areas that are prone to the synthesis of neutral program variants. Our empirical study relies on automatic variations of 6 real-world Java programs. First, we transform these programs with three state-of-the-art speculative transformations: add, replace and delete statements. We get a pool of 23445 neutral variants, from which we gather the following novel insights: developers naturally write code that supports fine-grain behavioral changes; statement deletion is a surprisingly effective speculative transformation; high-level design decisions, such as the choice of a data structure, are natural points that can evolve while keeping functionality.
Second, we design 3 novel speculative transformations, targeted at specific plastic regions. New experiments reveal that respectively 60\%, 58\% and 73\% of the synthesized variants (175688 in total) are neutral and exhibit execution traces that are different from the original.
The adoption of agile development approaches has put an increased emphasis on developer testing, resulting in software projects with strong test suites. These suites include a large number of test cases, in which developers embed knowledge about meaningful input data and expected properties in the form of oracles. This article surveys various works that aim at exploiting this knowledge in order to enhance these manually written tests with respect to an engineering goal (e.g., improve coverage of changes or increase the accuracy of fault localization). While these works rely on various techniques and address various goals, we believe they form an emerging and coherent field of research, which we call `test amplification’. We devised a first set of papers from DBLP, looking for all papers containing `test’ and `amplification’ in their title. We reviewed the 70 papers in this set and selected the 4 papers that fit our definition of test amplification. We use these 4 papers as the seed for our snowballing study, and systematically followed the citation graph. This study is the first that draws a comprehensive picture of the different engineering goals proposed in the literature for test amplification. In particular, we note that the goal of test amplification goes far beyond maximizing coverage only. We believe that this survey will help researchers and practitioners entering this new field to understand more quickly and more deeply the intuitions, concepts and techniques used for test amplification.
Software systems contain resilience code to handle those failures and unexpected events happening in production. It is essential for developers to understand and assess the resilience of their systems. Chaos engineering is a technology that aims at assessing resilience and uncovering weaknesses by actively injecting perturbations in production. In this paper, we propose a novel design and implementation of a chaos engineering system in Java called CHAOSMACHINE. It provides a unique and actionable analysis on exception-handling capabilities in production, at the level of try-catch blocks. To evaluate our approach, we have deployed CHAOSMACHINE on top of 3 large-scale and well-known Java applications totaling 630k lines of code. Our results show that CHAOSMACHINE reveals both strengths and weaknesses of the resilience code of a software system at the level of exception handling.
A few works address the challenge of automating software diversification, and they all share one core idea: using automated test suites to drive diversification. However, there is is lack of solid understanding of how test suites, programs and transformations interact one with another in this process. We explore this intricate interplay in the context of a specific diversification technique called “sosiefication”. Sosiefication generates sosie programs, i.e., variants of a program in which some statements are deleted, added or replaced but still pass the test suite of the original program. Our investigation of the influence of test suites on sosiefication exploits the following observation: test suites cover the different regions of programs in very unequal ways. Hence, we hypothesize that sosie synthesis has different performances on a statement that is covered by one hundred test case and on a statement that is covered by a single test case. We synthesize 24583 sosies on 6 popular open-source Java programs. Our results show that there are two dimensions for diversification. The first one lies in the specification: the more test cases cover a statement, the more difficult it is to synthesize sosies. Yet, to our surprise, we are also able to synthesize sosies on highly tested statements (up to 600 test cases), which indicates an intrinsic property of the programs we study. The second dimension is in the code: we manually explore dozens of sosies and characterize new types of forgiving code regions that are prone to diversification.