TOSEM

TOSEM 2025

Identifying the Failure-Revealing Test Cases in Metamorphic Testing: A Statistical Approach

Paper

Metamorphic testing, thanks to its high failure-detection effectiveness especially in the absence of test oracle, has been widely applied in both the traditional context of software testing and other relevant fields such as fault localization and program repair. Its core element is a set of metamorphic relations, which are the necessary properties of the target algorithm in the form of the relationships among multiple inputs and corresponding expected outputs. When a relation is violated by the outputs of a group of test cases, namely metamorphic group of test cases, that are constructed based on the relation, a failure is said to be revealed. Traditionally, the primary task of software testing is to reveal failures. Therefore, from the perspective of software testing, it may not need to know which test case(s) in the metamorphic group cause the violation and thus the failure. However, such information is definitely helpful for other software engineering activities, such as software debugging. The current literature of metamorphic testing lacks a systematic mechanism of identifying the actual failure-revealing test cases, which hinders its applicability and effectiveness in other relevant fields. In this article, we propose a new technique for the FAILure-revealing Test case Identification in Metamorphic testing, namely FAILTIM. The approach is based on a novel application of statistical methods. More specifically, we leverage and adapt the basic ideas of spectrum-based techniques, which are originally used in fault localization, and propose the utilization of a set of risk formulas to estimate the suspiciousness of each individual test case in metamorphic groups. Failure-revealing test cases are then suggested according to their suspiciousness. A series of experiments have been conducted to evaluate the effectiveness and efficiency of FAILTIM using 9 subject programs and 30 risk formulas. The experimental results showed that the new approach can achieve a high accuracy in identifying the actual failure-revealing test cases in metamorphic testing. Consequently, our study will help boost the applicability and performance of metamorphic testing beyond testing to other software engineering areas. The present work also unfolds a number of research directions for further advancing the theory of metamorphic testing and more broadly, software testing.

TOSEM 2024

Word Closure-Based Metamorphic Testing for Machine Translation

Paper
Code

With the wide application of machine translation, the testing of Machine Translation Systems (MTSs) has attracted much attention. Recent works apply Metamorphic Testing (MT) to address the oracle problem in MTS testing. Existing MT methods for MTS generally follow the workflow of input transformation and output relation comparison, which generates a follow-up input sentence by mutating the source input and compares the source and follow-up output translations to detect translation errors, respectively. These methods use various input transformations to generate the test case pairs and have successfully triggered numerous translation errors. However, they have limitations in performing fine-grained and rigorous output relation comparison and thus may report many false alarms and miss many true errors. In this article, we propose a word closure-based output comparison method to address the limitations of the existing MTS MT methods. We first propose word closure as a new comparison unit, where each closure includes a group of correlated input and output words in the test case pair. Word closures suggest the linkages between the appropriate fragment in the source output translation and its counterpart in the follow-up output for comparison. Next, we compare the semantics on the level of word closure to identify the translation errors. In this way, we perform a fine-grained and rigorous semantic comparison for the outputs and thus realize more effective violation identification. We evaluate our method with the test cases generated by five existing input transformations and the translation outputs from three popular MTSs. Results show that our method significantly outperforms the existing works in violation identification by improving the precision and recall and achieving an average increase of 29.9% in F1 score. It also helps to increase the F1 score of translation error localization by 35.9%.

MET-MAPF: A Metamorphic Testing Approach for Multi-Agent Path Finding Algorithms

Paper

The Multi-Agent Path Finding (MAPF) problem, i.e., the scheduling of multiple agents to reach their destinations, has been widely investigated. Testing MAPF systems is challenging, due to the complexity and variety of scenarios and the agents’ distribution and interaction. Moreover, MAPF testing suffers from the oracle problem, i.e., it is not always clear whether a test shows a failure or not. Indeed, only considering whether the agents reach their destinations without collision is not sufficient. Other properties related to the ‘quality’ of the generated paths should be assessed, e.g., an agent should not follow an unnecessarily long path. To tackle this issue, this article proposes MET-MAPF, a Metamorphic Testing approach for MAPF systems. We identified 10 Metamorphic Relations (MRs) that a MAPF system should guarantee, designed over the environment in which agents operate, the behaviour of the single agents and the interactions among agents. Starting from the different MRs, MET-MAPF automatically generates test cases addressing them, so possibly exposing different types of failures. Experimental results show that MET-MAPF can indeed find MR violations not exposed by approaches that only consider the completion of the mission as test oracle. Moreover, experiments show that different MRs expose different types of violations.

MR-Scout: Automated Synthesis of Metamorphic Relations from Existing Test Cases

Paper

Abstract: Metamorphic Testing (MT) alleviates the oracle problem by defining oracles based on metamorphic relations (MRs) that govern multiple related inputs and their outputs. However, designing MRs is challenging, as it requires domain-specific knowledge. This hinders the widespread adoption of MT. We observe that developer-written test cases can embed domain knowledge that encodes MRs. Such encoded MRs could be synthesized for testing not only their original programs but also other programs that share similar functionalities.

In this article, we propose MR-Scout to automatically synthesize MRs from test cases in open-source software (OSS) projects. MR-Scout first discovers MR-encoded test cases (MTCs), and then synthesizes the encoded MRs into parameterized methods (called codified MRs), and filters out MRs that demonstrate poor quality for new test case generation. MR-Scout discovered over 11,000 MTCs from 701 OSS projects. Experimental results show that over 97% of codified MRs are of high quality for automated test case generation, demonstrating the practical applicability of MR-Scout. Furthermore, codified-MRs-based tests effectively enhance the test adequacy of programs with developer-written tests, leading to 13.52% and 9.42% increases in line coverage and mutation score, respectively. Our qualitative study shows that 55.76% to 76.92% of codified MRs are easily comprehensible for developers.

TOSEM 2023

Feedback-directed Metamorphic Testing

Paper

Abstract: Over the past decade, metamorphic testing has gained rapidly increasing attention from both academia and industry, particularly thanks to its high efficacy on revealing real-life software faults in a wide variety of application domains. On the basis of a set of metamorphic relations among multiple software inputs and their expected outputs, metamorphic testing not only provides a test case generation strategy by constructing new (or follow-up) test cases from some original (or source) test cases, but also a test result verification mechanism through checking the relationship between the outputs of source and follow-up test cases. Many efforts have been made to further improve the cost-effectiveness of metamorphic testing from different perspectives. Some studies attempted to identify “good” metamorphic relations, while other studies were focused on applying effective test case generation strategies especially for source test cases. In this article, we propose improving the cost-effectiveness of metamorphic testing by leveraging the feedback information obtained in the test execution process. Consequently, we develop a new approach, namely feedback-directed metamorphic testing, which makes use of test execution information to dynamically adjust the selection of metamorphic relations and selection of source test cases. We conduct an empirical study to evaluate the proposed approach based on four laboratory programs, one GNU program, and one industry program. The empirical results show that feedback-directed metamorphic testing can use fewer test cases and take less time than the traditional metamorphic testing for detecting the same number of faults. It is clearly demonstrated that the use of feedback information about test execution does help enhance the cost-effectiveness of metamorphic testing. Our work provides a new perspective to improve the efficacy and applicability of metamorphic testing as well as many other software testing techniques.