Data valuation for machine learning Miguel de Benito Delgado PyConDE & PyDataBerlin 2024 conference

Monday 12:15 in B07-B08

Type/Track Talk pydata-data-handling-engineering

Data valuation techniques compute the contribution of training points to the final performance of machine learning models. They are part of so-called data-centric ML, with immediate applications in data engineering like data pruning or improved collection processes, and in model debugging and development. In this talk we demonstrate how the open source library pyDVL can be used to detect mislabeled and out-of-distribution samples with little effort. We cover the core ideas behind the most successful algorithms and illustrate how they can be used to inspect your data to extract the most out of it.

Level Domain Expertise Intermediate Python Skill Level Intermediate

Miguel de Benito Delgado

Affiliation: TransferLab, appliedAI Institute gGmbH

After several years working as a software developer, Miguel pursued studies in pure mathematics in Madrid and Munich. After finishing his PhD in mathematics, and a short research stay in machine learning, he finally transitioned into the field and ended up working as an applied researcher at the appliedAI Initiative, where he went on to found and head the TransferLab.

visit the speaker at: Github • Homepage