The Prevalence of Errors in Machine Learning Experiments

LTH: Monday, May 16, 13:15–14:00, Location: LTH Matteannexet, MA4

BTH: Tuesday, May 17, 13:15–14:00, Location: J1360, building J

Prof. Martin Shepperd, Gothenburg University/Brunel University London

Biography

Martin Shepperd is the 2022 Swedish Tage Erlander research professor funded by the Swedish Research Council – the first professor of Computer Science holding this professorship since its inception 1982. This year the professorship is placed at Gothenburg University, and he also has the chair of Software Modelling & Technology at Brunel University London. He has a BSc in Economics, and an MSc and PhD in Computer
Science. He worked as a software developer for HSBC before returning to academia. He has published 3 books and more than 180 refereed research articles in the areas of software engineering and machine learning. He is a fellow of the British Computer Society.

Abstract

Computational experiments are the dominant paradigm to understand and compare machine learning algorithms. Typically, multiple learning algorithms (the treatments) are compared over multiple datasets that
provide training and validation subsets using various predictive performance metrics, i.e., the response variables. Such experimental designs are referred to as repeated-measure designs. This way we build
knowledge through sense-making of many results. But we need to be sure our experimental results are reliable. I answer this question by examining the domain of software defect prediction. A re-analysis of
experiments found ~40% contained inconsistent results and/or basic statistical errors. Elsewhere I show that inappropriate response metrics can not only change the magnitude of results but also the direction of
effects in ~25% of cases.

We all make errors, and there can be considerable complexity in our computational experiments, so I recommend (i) use open science to expose studies to scrutiny, (ii) try to avoid dichotomous inferencing methods
and (iii) use meta-analysis with caution!

*** *** ***