Apache Parquet has become the de facto format for storing tabular (DataFrame) data on disk. This is done through universal compression and efficient knowledge of the stored data structure. As part of this talk, we would like to show the core structure of Parquet and the knobs that allow you to get even more of the capabilities of the file format.
Uwe L. Korn
Affiliation: QuantCo, Inc.
Uwe Korn is a CTO at the data science company QuantCo. His expertise is in building scalable architectures for machine learning services and the teams & culture around them. Nowadays, he focuses on the data engineering infrastructure that is needed to provide the building blocks to bring machine learning models into production. As part of his work to provide an efficient data interchange, he became a core committer to the Apache Parquet, Apache Arrow and conda-forge projects.