Miguel de Benito Delgado
pyDVL is the library for data valuation in machine learning. Use it to clean, prune and select your data to improve model performance.
Hi all! I’ll discuss Zarr, an open-source data format for storing chunked, compressed N-dimensional arrays. We’ll explore the Zarr ecosystem from fundamentals to V3.0 and beyond. If you’re interested in storing massive datasets, please attend my talk. Thanks!
Uwe L. Korn
Only ever used pandas.to_parquet? Would you like to know what it does and how you could make it even more efficient? Find out about Parquet's newest features in this talk.
Learn to read equations like an engineer and lose your fear of math
Data stories are the bridge between complex data insights and business impact! Transforming data into clear, actionable narratives is no easy task. That's where Streamlit and Snowflake come in - a duo for creating visually engaging, interactive data applications.
Florian Jetter, Patrick Hoefler
Dask DataFrame is fast now - The re-implementation of DataFrames in Dask is fast, reliable and fun.
Learn how to use Polars for time series: what it does, and it doesn't do (and what to do about that!)
Learn to make practical decisions in data engineering with Python's vast ecosystem. Avoid blindly following market guidelines and consider the reality of your situation for better performance and architecture
Anuun, Hiba Jamal
A new wave in data engineering! From tangled tasks to sleek, plug-and-play magic in data pipelines. 🚀