Multi-Model AI Document Search & Citation Tool
AI document search that returns the exact passage with a citation back to the source page — not a summarised guess. Works across PDFs, books, scanned documents, and long technical reports.
From upload to cited answer.
Researchers upload PDFs, books, and scanned documents. The pipeline OCRs the scanned ones, picks a chunking strategy per document type, and stores the embeddings in pgvector. Queries route through a multi-model layer — GPT-4, LLaMA, or Gemini — and every answer carries a page-level citation back to the source.
Upload documents to your corpus
Researcher · Corpus #14 · 3 documents in queue
This is an animated mockup of the document-search capability — not a live product. Document titles, page numbers, and answer text are illustrative.
OCR + PDF parser
Born-digital PDFs are parsed natively; scanned PDFs and image-based pages flow through OCR first. The same downstream pipeline sees clean text either way.
Adaptive chunking engine
Paragraph-based for legal contracts, word-count for technical manuals, page-based for academic papers. One strategy across every document type would lose meaning at the boundaries.
Vector store (pgvector)
Embeddings land in pgvector for fast cosine-similarity retrieval. Postgres is the same database the rest of the app uses — one less moving part to operate.
Multi-model RAG pipeline
GPT-4 for legal precision, LLaMA 3 for technical manuals, Gemini Pro for academic synthesis — routed through a common interface so the pipeline picks the right model per query.
Citation engine (page-level)
Every answer carries a citation back to the source page. Legal, compliance, and research workflows can verify each passage instead of trusting a summary.
Document structure analyser
A pre-processing analyser inspects each document, identifies its type, and picks the chunking strategy — instead of forcing one strategy across the whole corpus.
AI document search that returns the exact passage with a citation back to the source page — not a summarised guess. Works across PDFs, books, scanned documents, and long technical reports.
Documents are parsed (OCR if scanned), analysed for structure, and chunked using an adaptive strategy chosen per document. Embeddings land in a vector database; queries route through a multi-model RAG pipeline that picks the best LLM for the document type. Every answer carries a page-level citation back to the source.
Users upload PDFs, books, or scanned documents. The system inspects each document and picks the right chunking strategy automatically — paragraph-based for prose, page-based for legal contracts, word-count for technical manuals — instead of forcing one strategy on every document. Embeddings land in a vector store; when a user asks a question, retrieval finds the relevant chunks, and a multi-model layer routes the query to GPT-4, LLaMA, or Gemini depending on the document type. Answers include a citation pointing back to the exact source page, so legal and research workflows can verify every claim.
How a request flows through it
Each request enters at the top of the diagram, flows through every box, and lands at the bottom — exactly the way the production system behaves. The scan-line traces where a live request would be right now.
What it's built with
The interesting parts
Adaptive chunking per document
A pre-processing analyser picks paragraph-, word-, or page-based chunking per document — instead of forcing one strategy across legal contracts, technical books, and corporate reports.
Multi-model behind a common interface
GPT-4, LLaMA, and Gemini routed through a common interface so the pipeline picks the best model per query without app-level changes.
Citation back to source page
Every answer carries a page-level citation back to the original document — what makes the tool usable in legal, compliance, and academic workflows where a passage without a reference is opinion.
OCR for scanned documents
Scanned PDFs and images flow through OCR before chunking, so the same search experience works on born-digital and scanned content alike.
The calls that did most of the work
A handful of engineering choices shape how a system feels. Here are the ones we'd still defend — alongside what each one cost.
Adaptive chunking per document
Legal contracts, technical books, and corporate reports each have a different 'natural unit' for a query to land on — one fixed chunk size lands badly on at least one of them.
Tradeoff: A pre-processing analyser adds latency before the first query can run on a new document.
Multi-model behind a common interface
Different LLMs perform differently on legal vs technical vs academic text. A common interface lets the system pick per query without app-level changes.
Tradeoff: Three model contracts to test, three rate-limit budgets to manage, three places to chase up regressions.
Citation back to source page, not just text
A passage without a page reference is opinion; citing the exact source page is what makes the tool usable in legal and research workflows.
Tradeoff: The chunking and embedding layer has to carry page metadata through every transformation.
Tell us what you're building.
Free 30-minute call. Real humans, real timelines, no follow-up emails forever.