A deep dive into the Arrow Columnar format with pyarrow and nanoarrow
Joris Van den Bossche, Raúl Cumplido, Alenka Frim
Apache Arrow has become a de-facto standard for efficient in-memory columnar data representation. You might have heard about Arrow or using Arrow, but do you understand the format and why it’s so useful? This tutorial will dive deep into the details of the Arrow columnar format, the different types and buffer layouts, and explore those details interactively using the pyarrow and nanoarrow libraries.
Joris Van den Bossche
Affiliation: Voltron Data
I am a core contributor to pandas and Apache Arrow, and a maintainer of GeoPandas. I did a PhD at Ghent University and VITO in air quality research and worked at the Paris-Saclay Center for Data Science. Currently, I work at Voltron Data, contributing to Apache Arrow, and am a freelance teacher of python (pandas) at Ghent University.
visit the speaker at: Github
Raúl Cumplido
Affiliation: Voltron Data
I started working with Python in 2008 with Python 2.5 and since then it became my language of choice. I have been involved in the Spanish Python community being one of the co-founders of the Python Spanish Association. I have been involved in the organisation of EuroPython in Bilbao, several PyCon ES (Spain) and the Barcelona meetup. A couple of years ago I started working in Apache Arrow and since then I have become a committer and a PMC member and I want to share to the rest of the world what we have done and what we are doing.
Alenka Frim
Affiliation: Voltron Data
My software development journey started with open source and Apache Arrow project. More specifically, I started with contributing to the Arrow R package in 2021. After that I have contributed to other open source projects connected to the Python dataframe API standard while on Quansight and became a Apache Arrow committer in 2022 after being a regular contributor to Apache Arrow (Python) since 2021. I am currently working at Voltron Data as a Software Engineer.