Learning Realistic Mutations: Bug Creation for Neural Bug Detectors (ICST 2022 - Research Papers)

Who

Cedric Richter, Heike Wehrheim

Track

ICST 2022 Research Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 6 Apr 2022 15:15 - 15:30 at Margaret Hamilton - ICST AI II Chair(s): Donghwan Shin

Abstract

Mutations are small, often token-level changes to program code, typically performed during mutation testing for evaluating the quality of test suites. Recently, code mutations have come in use for creating benchmarks of buggy code. Such bug benchmarks present valuable aids for the evaluation of testing, debugging or bug repair tools. Moreover, they can serve as training data for learning-based (neural) bug detectors. Key to all these applications is the creation of realistic bugs which closely resemble faults made by software developers. In this paper, we present a learning-based approach to mutation. We propose a novel contextual mutation operator which incorporates knowledge about the mutation context to inject natural and more realistic faults into code. Our approach employs a masked language model to produce a context-dependent distribution over feasible token replacements. The strategy for producing realistic mutations is thus learned. Our experimental evaluation on Java, JavaScript and Python programs shows that sampling from a language model does not only produce mutants which more accurately represent real bugs (with a reproduction score nearly 70% higher than for mutations employed in testing), but also lead to better performing bug detectors when trained on thus generated bug benchmarks.

Cedric Richter

Carl von Ossietzky Universität Oldenburg / University of Oldenburg

Germany

Heike Wehrheim