In the realm of Large Language Models , achieving proficiency in mathematical reasoning presents a notable challenge. This task necessitates a blend of multi-step problem-solving, natural language understanding, and robust computational skills. Among the datasets that rigorously test these capabilities, OpenAI's GSM8K stands out as a critical benchmark for assessing these capabilities.

In this technical exploration, we will apply Adala – a new open-source Python framework designed to refine prompt efficiency – to enhance LLM performance on the GSM8K dataset. Leveraging Adala's agent-based workflow, we achieved a substantial 29.26% absolute improvement in our LLM's baseline performance on this dataset, boosting accuracy from 44.88% → 74.14%. More impressively, this leap in performance was achieved without the necessity of manual prompt engineering, highlighting the tool's efficacy.

This talk will cover how agent-based workflows, using LLMs as the kernel for a computation runtime, can improve upon baseline benchmarks by utilizing automated prompt engineering.

Chris Hoge

Affiliation: HumanSignal

Chris Hoge is the Head of Community for HumanSignal, helping to grow the Label Studio community. He has spent over a decade working in open-source machine learning and infrastructure communities, including Apache TVM, Kubernetes, and OpenStack. He has an M.S. in Applied Mathematics from the University of Colorado, where he studied numerical methods for simulating physical systems. He splits his time between the PNW and NYC, where he spends his free time trail running and playing piano.

visit the speaker at: Github