5. RAG interface

BLUF: This script builds a simple RAG (Retrieval-Augmented Generation) interface using Streamlit, SentenceTransformers, and ChromaDB. It lets a user ask a question, retrieves the most relevant chunks from a local PDF-based vector database, and shows a prompt that can be copy-pasted into a local LLM like LLaMA or Ollama.

Use cases

To support document-based question answering using RAG. Specifically:

User inputs a question.
The question is embedded into a vector using SentenceTransformer.
ChromaDB retrieves the most relevant chunks of text based on vector similarity.
The script shows the retrieved context and constructs a full prompt that the user can feed into their LLM (e.g., Ollama, LLaMA) for a response.

What it does (Step-by-step)

Step	Description
1. st.text_input(...)	User types in a question.
2. SentenceTransformer(...)	Loads a small transformer to embed the question as a vector.
3. chromadb.Client().get_collection(...)	Connects to an existing Chroma vector DB that stores PDF chunks.
4. collection.query(...)	Finds the top 3 most relevant chunks.
5. Display chunks	Shows those chunks as Markdown for readability.
6. Construct full_prompt	Creates a structured input that can be copy-pasted into a local LLM to generate an answer.

📎 Example Use Case

You’ve already parsed a set of PDFs and stored them as vectors in ChromaDB. This interface lets you:

Ask, “What is the main idea of section 2?”
Get top-3 relevant text chunks from your documents.
Get a ready-to-use prompt for local model inference (no OpenAI API needed).

import streamlit as st
from sentence_transformers import SentenceTransformer
import chromadb

# Page setup
st.set_page_config(page_title="RAG QA with PDF Knowledge", layout="wide")
st.title("🔍 RAG QA from Your PDFs")

# Input field
query = st.text_input("Ask a question based on your documents:")

if query:
    # Load model and vector database
    model = SentenceTransformer('all-MiniLM-L6-v2')
    chroma_client = chromadb.Client()
    collection = chroma_client.get_collection("rag_chunks")

    # Embed user query
    query_vector = model.encode(query).tolist()

    # Search for similar chunks
    results = collection.query(query_embeddings=[query_vector], n_results=3)
    contexts = results['documents'][0]

    # Display retrieved chunks
    st.subheader("📄 Retrieved Contexts")
    for i, chunk in enumerate(contexts):
        st.markdown(f"**Chunk {i+1}:**")
        st.code(chunk, language="markdown")

    # Construct the final prompt
    context_str = "\n---\n".join(contexts)
    full_prompt = f"""Context:
{context_str}

Question: {query}
Answer:"""

    # Display the prompt
    st.subheader("🧠 Constructed Prompt")
    st.code(full_prompt, language="markdown")

    st.info("📌 Copy this prompt into your local LLM (llama.cpp, Ollama, etc.) to get a generated response.")

else:
    st.warning("Enter a question to begin.")

Execute with

pip install streamlit sentence-transformers chromadb
streamlit run app.py

6. main.py