5. RAG interface
BLUF: This script builds a simple RAG (Retrieval-Augmented Generation) interface using Streamlit, SentenceTransformers, and ChromaDB. It lets a user ask a question, retrieves the most relevant chunks from a local PDF-based vector database, and shows a prompt that can be copy-pasted into a local LLM like LLaMA or Ollama.
Use cases
To support document-based question answering using RAG. Specifically:
- User inputs a question.
- The question is embedded into a vector using SentenceTransformer.
- ChromaDB retrieves the most relevant chunks of text based on vector similarity.
- The script shows the retrieved context and constructs a full prompt that the user can feed into their LLM (e.g., Ollama, LLaMA) for a response.
What it does (Step-by-step)
Step | Description |
---|---|
1. st.text_input(...) | User types in a question. |
2. SentenceTransformer(...) | Loads a small transformer to embed the question as a vector. |
3. chromadb.Client().get_collection(...) | Connects to an existing Chroma vector DB that stores PDF chunks. |
4. collection.query(...) | Finds the top 3 most relevant chunks. |
5. Display chunks | Shows those chunks as Markdown for readability. |
6. Construct full_prompt | Creates a structured input that can be copy-pasted into a local LLM to generate an answer. |
π Example Use Case
Youβve already parsed a set of PDFs and stored them as vectors in ChromaDB. This interface lets you:
- Ask, βWhat is the main idea of section 2?β
- Get top-3 relevant text chunks from your documents.
- Get a ready-to-use prompt for local model inference (no OpenAI API needed).
import streamlit as st
from sentence_transformers import SentenceTransformer
import chromadb
# Page setup
st.set_page_config(page_title="RAG QA with PDF Knowledge", layout="wide")
st.title("π RAG QA from Your PDFs")
# Input field
query = st.text_input("Ask a question based on your documents:")
if query:
# Load model and vector database
model = SentenceTransformer('all-MiniLM-L6-v2')
chroma_client = chromadb.Client()
collection = chroma_client.get_collection("rag_chunks")
# Embed user query
query_vector = model.encode(query).tolist()
# Search for similar chunks
results = collection.query(query_embeddings=[query_vector], n_results=3)
contexts = results['documents'][0]
# Display retrieved chunks
st.subheader("π Retrieved Contexts")
for i, chunk in enumerate(contexts):
st.markdown(f"**Chunk {i+1}:**")
st.code(chunk, language="markdown")
# Construct the final prompt
context_str = "\n---\n".join(contexts)
full_prompt = f"""Context:
{context_str}
Question: {query}
Answer:"""
# Display the prompt
st.subheader("π§ Constructed Prompt")
st.code(full_prompt, language="markdown")
st.info("π Copy this prompt into your local LLM (llama.cpp, Ollama, etc.) to get a generated response.")
else:
st.warning("Enter a question to begin.")
Execute with
pip install streamlit sentence-transformers chromadb
streamlit run app.py