Data valuation for machine learning
Miguel de Benito Delgado

pyDVL is the library for data valuation in machine learning. Use it to clean, prune and select your data to improve model performance.

Exploring Zarr: From Fundamentals to Version 3.0 and Beyond
Sanket Verma

Hi all! I’ll discuss Zarr, an open-source data format for storing chunked, compressed N-dimensional arrays. We’ll explore the Zarr ecosystem from fundamentals to V3.0 and beyond. If you’re interested in storing massive datasets, please attend my talk. Thanks!

Going beyond Parquet's default settings – be surprised what you can get
Uwe L. Korn

Only ever used pandas.to_parquet? Would you like to know what it does and how you could make it even more efficient? Find out about Parquet's newest features in this talk.

Lose your fear of equations!
Darina Goldin

Learn to read equations like an engineer and lose your fear of math

Next Stop: Insights! How Streamlit and Snowflake Power Up Data Stories
Marie-Kristin Wirsching

Data stories are the bridge between complex data insights and business impact! Transforming data into clear, actionable narratives is no easy task. That's where Streamlit and Snowflake come in - a duo for creating visually engaging, interactive data applications.

Pandas + Dask DataFrame 2.0 - Comparison to Spark, DuckDB and Polars
Florian Jetter, Patrick Hoefler

Dask DataFrame is fast now - The re-implementation of DataFrames in Dask is fast, reliable and fun.

Polars and Time Series: what it can do, and how to overcome any limitation
Marco Gorelli

Learn how to use Polars for time series: what it does, and it doesn't do (and what to do about that!)

The pragmatic Pythonic data engineer
Robson Junior

Learn to make practical decisions in data engineering with Python's vast ecosystem. Avoid blindly following market guidelines and consider the reality of your situation for better performance and architecture

The Struggles We Skipped: Data Engineering for the TikTok Generation
Anuun, Hiba Jamal

A new wave in data engineering! From tangled tasks to sleek, plug-and-play magic in data pipelines. 🚀