Build an AI Document Inquiry Chat with Offline LLMs
Pavithra Eswaramoorthy, Philip Meier
As we descend from the peak of the hype cycle around Large Language Models (LLMs), chat-based document inquiry systems have emerged as a high-value practical use case. Retrieval-Augmented Generation (RAG) is a technique to share relevant context and external information (retrieved from vector storage) to LLMs, thus making them more powerful and accurate.
In this hands-on tutorial, we’ll dive into RAG by creating a personal chat app that accurately answers questions about your selected documents. We’ll use a new OSS project called Ragna that provides a friendly Python and REST API, designed for this particular case. We’ll test the effectiveness of different LLMs and vector databases, including an offline LLM (i.e., local LLM) running on GPUs on the cloud-machines provided to you. And, we’ll conclude by demonstrating how to quickly build personal or company-level chat-based document interrogation systems.
Pavithra Eswaramoorthy
Affiliation: Quansight
Pavithra Eswaramoorthy is a Developer Advocate at Quansight, where she works to improve the developer experience and community engagement for several open source projects in the PyData community. Currently, she maintains the Bokeh visualization library, and contributes to the Nebari (adjacent to the Jupyter community), conda-store (part of the conda ecosystem), and Ragna (RAG orchestration framework) projects. Pavithra has been involved in the open source community for over 5 years, notable as a maintainer of the Dask library and an administrator for Wikimedia’s OSS programs. In her spare time, she enjoys a good book and hot coffee. :)
Philip Meier
Affiliation: Quansight
Philip is a Senior Software Engineer at Quansight. His recent worked focused on Ragna (https://ragna.chat) an OSS RAG orchestration framework.
visit the speaker at: Github