In the realm of machine learning, the complexity of data pipelines often hinders rapid experimentation and iteration. This talk will introduce DDataflow, an innovative open-source tool, designed to facilitate end-to-end testing in ML pipelines by leveraging decentralized data sampling. Attendees will gain insights into the challenges of unit testing in large-scale data pipelines, the design philosophy behind DDataflow, and practical implementation strategies to enhance the reliability and efficiency of their ML pipelines.

Theodore Meynard

Affiliation: GetYourGuide

Theodore Meynard is a data science manager at GetYourGuide. He leads the evolution of their ranking algorithm, helping customers find the best activities to book and locations to explore. Beyond work, he is one of the co-organizers of the Pydata Berlin meetup and the conference. When he is not programming, he loves riding his bike looking for the best bakery-patisserie in town.

visit the speaker at: Github

Jean Machado

Affiliation: GetYourGuide

Jean Carlo Machado is a Brazilian DataScience Manager at GetYourGuide for the Growth Data Products team and the Machine Learning Platform Team. From this point of view is able to collaborate with amazing people in turning business opportunities into data science products, from inception to large scale production deployments of multiple data products. Jean values community building and getting communities together; he is currently one of the organizers of the MLOps.community Berlin. Jean spends a significant part of his ever shrinking free time building open-source tools his focus right now building social good tech.

visit the speaker at: Github