Learning Realistic Mutations: Bug Creation for Neural Bug Detectors
Mutations are small, often token-level changes to program code, typically performed during mutation testing for evaluating the quality of test suites. Recently, code mutations have come in use for creating benchmarks of buggy code. Such bug benchmarks present valuable aids for the evaluation of testing, debugging or bug repair tools. Moreover, they can serve as training data for learning-based (neural) bug detectors. Key to all these applications is the creation of realistic bugs which closely resemble faults made by software developers. In this paper, we present a learning-based approach to mutation. We propose a novel contextual mutation operator which incorporates knowledge about the mutation context to inject natural and more realistic faults into code. Our approach employs a masked language model to produce a context-dependent distribution over feasible token replacements. The strategy for producing realistic mutations is thus learned. Our experimental evaluation on Java, JavaScript and Python programs shows that sampling from a language model does not only produce mutants which more accurately represent real bugs (with a reproduction score nearly 70% higher than for mutations employed in testing), but also lead to better performing bug detectors when trained on thus generated bug benchmarks.