Evaluating Features for Machine Learning Detection of Order- and Non-Order-Dependent Flaky Tests
Flaky tests are test cases that can pass or fail without code changes. They cause major problems in software development such as wasting the time of developers and obstructing continuous integration. The research community has presented automated techniques for detecting flaky tests, though many involve repeated test executions and significant instrumentation and therefore may be both intrusive and expensive. While this motivates researchers to evaluate machine learning models for detecting flaky tests, research on the features used to encode a test case is limited. Without further study on this topic, machine learning models cannot perform to their full potential in this domain. Previous studies also exclude a specific, yet prevalent and problematic, category of flaky tests: order-dependent (OD) flaky tests. Because of this, previous research only addresses a subset of the problem of flaky test detection. This paper presents a new feature set for encoding test cases. We compared our new feature set to a previously established feature set when evaluating the detection performance of 54 pipelines of data preprocessing, data balancing, and machine learning models for detecting both non-order-dependent (NOD) and OD flaky tests. As our data set, we used the test suites of 26 Python projects, consisting of over 67,000 test cases. This paper’s empirical study reveals a number of findings, including (1) a 13% increase in overall F1 score when detecting NOD flaky tests using our new feature set; (2) a 17% increase in overall F1 score when detecting OD flaky tests using our new feature set; and (3) the most impactful metrics of our new feature set for detecting both types of flaky test.
Tue 5 AprDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
15:45 - 16:45 | ICST AI IResearch Papers / Industry at Margaret Hamilton Chair(s): Raihana Ferdous Fondazione Bruno Kessler | ||
15:45 15mTalk | IFRIT: Focused Testing through Deep Reinforcement Learning Research Papers Andrea Romdhana DIBRIS - University of Genoa, FBK - Security & Trust unit, Mariano Ceccato University of Verona, Alessio Merlo DIBRIS - University of Genoa, Paolo Tonella USI Lugano | ||
16:00 15mTalk | Robustness assessment and improvement of a neural network for blood oxygen pressure estimation Industry Paolo Arcaini National Institute of Informatics
, Andrea Bombarda University of Bergamo, Silvia Bonfanti University of Bergamo, Angelo Gargantini University of Bergamo, Daniele Gamba AISent S.r.l., Rita Pedercini AISent S.r.l. Pre-print | ||
16:15 15mTalk | Evaluating Features for Machine Learning Detection of Order- and Non-Order-Dependent Flaky Tests Research Papers Owain Parry The University of Sheffield, Gregory Kapfhammer Allegheny College, Michael Hilton Carnegie Mellon University, USA, Phil McMinn University of Sheffield | ||
16:30 15mLive Q&A | Discussion and Q&A Research Papers |