PyData Session List

Talk pydata-machine-learning-deep-learning-stats

A conceptual and practical introduction to Hilbert Space Gaussian Process (HSGP) approximation methods

Dr. Juan Orduz

In this talk, we explore a new method to approximate Gaussian processes using spectral analysis methods, known as the Hilbert Space Gaussian process (HSGP) approximation.

Tutorial pydata-pydata-scientific-libraries-stack

A deep dive into the Arrow Columnar format with pyarrow and nanoarrow

Joris Van den Bossche, Raúl Cumplido, Alenka Frim

Apache Arrow has become a de-facto standard for efficient in-memory columnar data representation. But what is this format exactly? This tutorial will dive deep into the details of the Arrow columnar format and explore interactively the different types and buffer layouts.

Talk pydata-generative-ai

A Retrieval Augmented Generation system to query the scikit-learn documentation

Guillaume Lemaitre

A Retrieval Augmented Generation system to query the scikit-learn documentation

Tutorial pydata-pydata-scientific-libraries-stack

Boost your Data Science skills with the new Python in Excel

Valerio Maggio

Do you know that you can now run Python directly into Excel ? Come to my tutorial to know more, and to boost your data analytics skills!

Talk pydata-machine-learning-deep-learning-stats

Breaking AI Boundaries: Fairness Metrics in Unstructured Data Domains

Daniel Klitzke

Exploring the need for fairness in machine learning in indirect human impact areas, proposing solutions for challenges in unstructured data.

Tutorial pydata-natural-language-processing-computer-vision

Build an AI Document Inquiry Chat with Offline LLMs

Pavithra Eswaramoorthy, Philip Meier

In this hands-on tutorial, we'll build an LLM-powered document inquiry chat application that uses Retrieval-Augmented Generation (RAG) for more accurate results. We'll test different LLMs, run an offline LLM on GPUs, and demonstrate a fully functional web app.

Tutorial pydata-machine-learning-deep-learning-stats

Build TikTok's Personalized Real-Time Recommendation System in Python with Hopsworks

Jim Dowling

The real-time recommendations engine, Monolith, in Tiktok is so good it has been described as "digital crack". In 1 hr, we will build Monolith in Python as 3 ML pipelines that run on Hopsworks .

Talk pydata-natural-language-processing-computer-vision

Building Professional Voice AI with Vocode

Lev Konstantinovskiy

Meet Vocode, an open-source framework for AI voice agents. We'll cover its integration of speech APIs, LLMs, and conversation etiquette in real-world applications. #OpenSource #AI #VoiceAgents

Talk pydata-natural-language-processing-computer-vision

Can ChatGPT convince you to get a COVID19 vaccine? Comparing ChatGPT to an expert system - which one is more convincing?

Dr. Lisa Andreevna Chalaguine

Comparison between ChatGPT and Domain-Specific Expert System - which is more convincing in getting people to vaccinate against COVID-19?

Cloud? No Thanks! I’m Gonna Run GenAI on My AI PC

Adrian Boguszewski, Dmitriy Pastushenkov

Join this talk to learn that cloud is no longer needed for GenAI. All you need is an AI PC.

Talk pydata-machine-learning-deep-learning-stats

Content Recommendation with Graphs: From Basic Walks to Neural Networks

Dr. Mirza Klimenta

Content Recommendation with Graphs: From Basic Walks to Neural Networks

Talk pydata-data-handling-engineering

Data valuation for machine learning

Miguel de Benito Delgado

pyDVL is the library for data valuation in machine learning. Use it to clean, prune and select your data to improve model performance.

Talk pydata-machine-learning-deep-learning-stats

Everything you need to know about change-point detection

Charles Truong

How do you detect an activity change from smartwatch data, abrupt climate transitions, or server failures? If you work with long time series, you will inevitably have to detect changes. This talk describes how to do that using ruptures (https://github.com/deepcharles/ruptures).

Talk pydata-data-handling-engineering

Exploring Zarr: From Fundamentals to Version 3.0 and Beyond

Sanket Verma

Hi all! I’ll discuss Zarr, an open-source data format for storing chunked, compressed N-dimensional arrays. We’ll explore the Zarr ecosystem from fundamentals to V3.0 and beyond. If you’re interested in storing massive datasets, please attend my talk. Thanks!

Talk pydata-natural-language-processing-computer-vision

Flix CitySnap: How we use GenAI and not only to collect captivating images for cities and confirm their locations

Andrei Chernov

Unlocking City Charisma: Leveraging Generative AI for Automated Image Collection and Elevated Customer Experience 🌟 Dive into Flix's innovative approach

Talk pydata-machine-learning-deep-learning-stats

From idea to production in a day: Leveraging Azure ML and Streamlit to build and user test machine learning ideas quickly

Florian Roscheck

How to leverage Azure ML, automated machine learning, and Streamlit to build and test machine learning apps quickly? Find out about our favorite Hackathon stack and walk away with some code to build and user-test your own machine learning ideas fast.

Talk pydata-generative-ai

From LLM as oracle to LLM as translator - our journey from theory to everyday’s practice in a corporate setting with dmGPT (and python)

Emma Haley, Niklas Lederer

Learn how dm-drogeriemarkt put LLMs in production and implemented a day-to-day assistant for everyone.

Talk pydata-data-handling-engineering

Going beyond Parquet's default settings – be surprised what you can get

Uwe L. Korn

Only ever used pandas.to_parquet? Would you like to know what it does and how you could make it even more efficient? Find out about Parquet's newest features in this talk.

Talk pydata-visualisation-jupyter

High Performance Data Visualization for the Web

Tim Paine

Building a high performance streaming data website with Perspective

Talk pydata-natural-language-processing-computer-vision

How to Do Monolingual, Multilingual, and Cross-lingual Text Classification in April, 2024

Daryna Dementieva

If I want a text classifier in 2024, what should I choose -- LLMs or pre-LLM era classifier? Is the answer the same for English and other languages? We will provide the recipe how to find your classifier depending on the target language and data availability.

Talk pydata-generative-ai

Improve LLM-based Applications with Fallback Mechanisms

Bilge Yücel

RAG handles common issues in LLM applications, but a dependable system requires one more step: a fallback mechanism. Explore the implementation of LLM applications with diverse fallback techniques using Haystack in Bilge's insightful talk.

Talk pydata-natural-language-processing-computer-vision

Is GenAI All You Need to Classify Text? Some Learnings from the Trenches

Marc Palyart, Kateryna Budzyak

GenAI is sometimes touted as the panacea for all natural language processing (NLP) tasks. This presentation explores a practical text classification scenario at Malt, highlighting the practical hurdles encountered when employing GenAI and how we overcame these obstacles.

Talk pydata-visualisation-jupyter

Jupyter Notebooks for Print Media

Tim Paine

Jupyter Notebooks as a platform to create books, magazine and newspaper articles, and other print media

Talk pydata-machine-learning-deep-learning-stats

Lessons learned from deploying Machine Learning in an old-fashioned heavy industry

Robert Meyer

Cement is responsible for about 8% of worldwide carbon emissions. Let me tell you about lessons learned decarbonizing the industry with Machine Learning.

Tutorial pydata-data-handling-engineering

Lose your fear of equations!

Darina Goldin

Learn to read equations like an engineer and lose your fear of math

Talk pydata-machine-learning-deep-learning-stats

Machine Learning on microcontrollers using MicroPython and emlearn

Jon Nordby

Deploy ML models to microcontrollers - using just the Python you already know! A practical presentation on how to use the emlearn Machine Learning package and MicroPython to build smart sensor systems.

Talk pydata-machine-learning-deep-learning-stats

Missing Data, Bayesian Imputation and People Analytics with PyMC

Nathaniel Forde

Hierarchical structures are everywhere in business! Ever wondered how trickle-down management missteps drive non-response bias in Employee Engagement? Model the hierarchy, model the missing-ness with PyMC!

Talk pydata-pydata-scientific-libraries-stack

Mostly Harmless Fixed Effects Regression in Python with PyFixest

Alexander Fischer

"Discover PyFixest, a Python library inspired by R's 'fixest'! 🐍📊 It speeds up regression model estimation with high-dimensional fixed effects, offering tools for robust inference and efficient post-processing. Perfect for AB Tests and event studies! #Python #DataScience #PyDat

Talk pydata-machine-learning-deep-learning-stats

Moving from Offline to Online Machine Learning with River

Tun Shwe

Learn the differences between online and offline ML and get started on your online ML journey today with River, an open source Python ML library

Talk pydata-data-handling-engineering

Next Stop: Insights! How Streamlit and Snowflake Power Up Data Stories

Marie-Kristin Wirsching

Data stories are the bridge between complex data insights and business impact! Transforming data into clear, actionable narratives is no easy task. That's where Streamlit and Snowflake come in - a duo for creating visually engaging, interactive data applications.

Talk pydata-data-handling-engineering

Pandas + Dask DataFrame 2.0 - Comparison to Spark, DuckDB and Polars

Florian Jetter, Patrick Hoefler

Dask DataFrame is fast now - The re-implementation of DataFrames in Dask is fast, reliable and fun.

Talk pydata-machine-learning-deep-learning-stats

Personalizing Carousel Ranking on Wolt's Discovery Page: A Hierarchical Multi-Armed Bandit Approach

Marcel Kurovski, Steffen Klempau

Personalizing Carousel Ranking on Wolt's Discovery Page with a Hierarchical Multi-Armed Bandit Approach

Talk pydata-data-handling-engineering

Polars and Time Series: what it can do, and how to overcome any limitation

Marco Gorelli

Learn how to use Polars for time series: what it does, and it doesn't do (and what to do about that!)

Talk pydata-generative-ai

Put your RAG to the test: Component-per-component evaluation of our LLM-powered airplane manufacturing assistant

Nataliia Kees

This talk discusses the topic of component-wise evaluation of RAG-based applications on the example of the airplane manufacturing assistant developed at Airbus using open source Python libraries paired with Google Vertex AI.

Talk pydata-generative-ai

RAG for a medical company: the technical and product challenges

Noé Achache

While developing a Proof-Of-Concept RAG is widely accessible, creating a performant version that truly adds value remains a challenge. We willl share our learnings from building a RAG for a medical company, aiding doctors with drug documentation.

Talk pydata-machine-learning-deep-learning-stats

Reinforcement Learning: Bridging The Gap Between Research and Applications

Michael Panchenko

Reinforcement learning (RL) has untapped potential for industry. This talk presents Tianshou, an open-source library with interfaces facilitating both industrial RL applications and new algorithm research, with the dual goals of accelerating progress and adoption.

Talk pydata-generative-ai

Safeguarding Privacy and Mitigating Vulnerabilities: Navigating Security Challenges in Generative AI

John Robert

How to protect and secure your data will using LLM and Generative AI. Your data privacy and security is importance.

Talk pydata-machine-learning-deep-learning-stats

Select ML from Databases

Gregor Bauer

Select ML from Databases: New workflow for building your machine learning models using the capabilities of modern databases

Talk pydata-machine-learning-deep-learning-stats

Tackling the Cold Start Challenge in Demand Forecasting

Alexander Meier, Daria Mokrytska

Exploring the Cold Start problem in Demand Forecasting. Overcoming difficulties faced by Time Series and ML models. Uncover practical techniques and a systematic evaluation framework for effective forecasting.

Talk pydata-machine-learning-deep-learning-stats

Tailored and Trending: Key learnings from 3 years of news recommendations

Dr. Christian Leschinski

Diving into the world of recommendations! Learn how we overcome the special challenges of recommending news at Axel Springer NMT by using simple statistics.

Talk pydata-machine-learning-deep-learning-stats

That’s it?! Dealing with unexpected data problems

Simon Pressler

That’s it?! How to deal with unexpected data quality and quantity issues

Talk pydata-natural-language-processing-computer-vision

The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs

Ines Montani

Are we heading further into a black box era with larger and larger models, obscured behind APIs controlled by big tech monopolies? I don’t think so, and in this talk, I’ll show you why.

Talk pydata-machine-learning-deep-learning-stats

The evolution of Feature Stores

Olamilekan Wahab

Feature Stores have become an important component of the machine learning lifecycle. They have been particularly pivotal in bridging the gap between data engineering and machine learning workflows(experimentation, training and serving). This talk will explore Feature Stores with

Talk pydata-data-handling-engineering

The pragmatic Pythonic data engineer

Robson Junior

Learn to make practical decisions in data engineering with Python's vast ecosystem. Avoid blindly following market guidelines and consider the reality of your situation for better performance and architecture

Talk pydata-data-handling-engineering

The Struggles We Skipped: Data Engineering for the TikTok Generation

Anuun, Hiba Jamal

A new wave in data engineering! From tangled tasks to sleek, plug-and-play magic in data pipelines. 🚀

Talk pydata-natural-language-processing-computer-vision

Using LLMs to Create Knowledge Graphs From a Large Corpus of Parliamentary Debates

Usman

This talk demonstrates how we can intuitively analyze political debates using knowledge graphs created using LLMs.

Tutorial pydata-machine-learning-deep-learning-stats

Using ML to find out the "Why"? A Tutorial in Causal Machine Learning

Oliver Schacht, Jan Teichert-Kluge

Tutorial on Causal Machine Learning by the developers of the DoubleML package for Python. Learn how to address "Why?" questions with ML! https://docs.doubleml.org/stable/index.html #Causality #CausalML #DoubleML #CausalInference

Talk pydata-generative-ai

Whispered Secrets: Building An Open-Source Tool To Live Transcribe & Summarize Conversations

John Sandall

🕵️ Calling all Spythonistas: Do you need a live speech transcription and summarization "secret agent" that works offline by running on your own hardware? Learn about the latest trends in open-source GenAI tools and how to build your own in this light-hearted talk.

Talk pydata-natural-language-processing-computer-vision

Would you rely on ChatGPT to dial 911? A talk on balancing determinism and probabilism in production machine learning systems

Nicolas Guenon des Mesnards

Combining deterministic and probabilistic models to boost ML system robustness. Learn their benefits and applications in AI, backed by NLP case studies. #AIInnovation #MLTech #RobustAI

Talk pydata-machine-learning-deep-learning-stats

Your Model _Probably_ Memorized the Training Data

Katharine Jarmul

So, just how much data did ChatGPT memorize? Let's find out!

Talk pydata-machine-learning-deep-learning-stats

🌳 The taller the tree, the harder the fall. Determining tree height from space using Deep Learning and very high resolution satellite imagery 🛰️

Ferdinand Schenck

🌳 The taller the tree, the harder the fall. Measuring tree height from space using Deep Learning 🛰️