PyData: Data Handling & Engineering | PyConDE & PyData Berlin 2024

Talk pydata-data-handling-engineering

Data valuation for machine learning

Miguel de Benito Delgado

pyDVL is the library for data valuation in machine learning. Use it to clean, prune and select your data to improve model performance.

Talk pydata-data-handling-engineering

Exploring Zarr: From Fundamentals to Version 3.0 and Beyond

Sanket Verma

Hi all! I’ll discuss Zarr, an open-source data format for storing chunked, compressed N-dimensional arrays. We’ll explore the Zarr ecosystem from fundamentals to V3.0 and beyond. If you’re interested in storing massive datasets, please attend my talk. Thanks!

Talk pydata-data-handling-engineering

Going beyond Parquet's default settings – be surprised what you can get

Uwe L. Korn

Only ever used pandas.to_parquet? Would you like to know what it does and how you could make it even more efficient? Find out about Parquet's newest features in this talk.

Tutorial pydata-data-handling-engineering

Lose your fear of equations!

Darina Goldin

Learn to read equations like an engineer and lose your fear of math

Talk pydata-data-handling-engineering

Next Stop: Insights! How Streamlit and Snowflake Power Up Data Stories

Marie-Kristin Wirsching

Data stories are the bridge between complex data insights and business impact! Transforming data into clear, actionable narratives is no easy task. That's where Streamlit and Snowflake come in - a duo for creating visually engaging, interactive data applications.

Talk pydata-data-handling-engineering

Pandas + Dask DataFrame 2.0 - Comparison to Spark, DuckDB and Polars

Florian Jetter, Patrick Hoefler

Dask DataFrame is fast now - The re-implementation of DataFrames in Dask is fast, reliable and fun.

Talk pydata-data-handling-engineering

Polars and Time Series: what it can do, and how to overcome any limitation

Marco Gorelli

Learn how to use Polars for time series: what it does, and it doesn't do (and what to do about that!)

Talk pydata-data-handling-engineering

The pragmatic Pythonic data engineer

Robson Junior

Learn to make practical decisions in data engineering with Python's vast ecosystem. Avoid blindly following market guidelines and consider the reality of your situation for better performance and architecture

Talk pydata-data-handling-engineering

The Struggles We Skipped: Data Engineering for the TikTok Generation

Anuun, Hiba Jamal

A new wave in data engineering! From tangled tasks to sleek, plug-and-play magic in data pipelines. 🚀