2. requirements.txt
Python Requirements Overview
This document explains the Python packages used in the project, their specific versions, and their roles in the RAG-CAG pipeline.
streamlit>=1.32.0
-
Interactive web framework for Python.
-
Provides a local UI for document upload and chat.
-
Optional (can be removed if using Open WebUI only).
PyPDF2>=3.0.1
-
Parses and extracts text from PDF files.
-
Useful for fallback parsing when Docling isn’t used or OCR fails.
requests>=2.31.0
-
HTTP client library to send and receive web requests.
-
Used for:
-
Querying Ollama API
-
Uploading documents to Open WebUI
-
Sending data to internal services
-
numpy>=1.26.4
-
Numerical array library used to support vector math and embeddings.
-
Required by sentence-transformers.
sentence-transformers>=2.2.2
-
Provides pre-trained transformer models for embedding text.
-
Used to generate vector embeddings of document chunks or queries.
langchain>=0.1.16
-
Modular framework to build LLM pipelines.
-
Used to:
-
Connect retrievers, prompts, and LLMs
-
Build RetrievalQA and RAG-CAG chains
-
langchain-docling>=0.1.3
-
Extension for langchain to integrate with docling.
-
Loads structured document chunks (from PDF, DOCX, etc.).
-
Enables more accurate text splitting and chunking compared to PyPDF2 alone.
Final requirements.txt
Contents
streamlit>=1.32.0
PyPDF2>=3.0.1
requests>=2.31.0
numpy>=1.26.4
sentence-transformers>=2.2.2
langchain>=0.1.16
langchain-docling>=0.1.3