Data valuation techniques compute the contribution of training points to the final performance of machine learning models. They are part of so-called data-centric ML, with immediate applications in data engineering like data pruning or improved collection processes, and in model debugging and development. In this talk we demonstrate how the open source library pyDVL can be used to detect mislabeled and out-of-distribution samples with little effort. We cover the core ideas behind the most successful algorithms and illustrate how they can be used to inspect your data to extract the most out of it.
Miguel de Benito Delgado
Affiliation: TransferLab, appliedAI Institute gGmbH
After several years working as a software developer, Miguel pursued studies in pure mathematics in Madrid and Munich. After finishing his PhD in mathematics, and a short research stay in machine learning, he finally transitioned into the field and ended up working as an applied researcher at the appliedAI Initiative, where he went on to found and head the TransferLab.