CAPABILITY · Client under NDA

Multi-Model AI Document Search & Citation Tool

AI document search that returns the exact passage with a citation back to the source page — not a summarised guess. Works across PDFs, books, scanned documents, and long technical reports.

Legal TechDocument AICorporate ResearchAcademic SearchCompliance & DiscoveryKnowledge ManagementRAG PlatformsEnterprise Search

under the hood…

See it work

From upload to cited answer.

Researchers upload PDFs, books, and scanned documents. The pipeline OCRs the scanned ones, picks a chunking strategy per document type, and stores the embeddings in pgvector. Queries route through a multi-model layer — GPT-4, LLaMA, or Gemini — and every answer carries a page-level citation back to the source.

docs.example/search

viewing as · Researcher

Researcher · Uploading documents

📚 doc search

Upload

Library

Recent

Settings

Upload documents to your corpus

Researcher · Corpus #14 · 3 documents in queue

OCR enabled

Drag PDFs, books, or scanned documents

up to 500 MB · PDF · EPUB · scanned image PDFs welcome

merger-agreement.pdf

87 pages

Native PDF

parsing PDF…100%

engineering-handbook.pdf

412 pages

Native PDF

parsing PDF…82%

research-paper-scanned.pdf

24 pages (scanned)

OCR

running OCR…64%

3 documents queued · entering analyser

upload · chunk · embed

cited answer · page 42

Demo only

This is an animated mockup of the document-search capability — not a live product. Document titles, page numbers, and answer text are illustrative.

OCR + PDF parser

Born-digital PDFs are parsed natively; scanned PDFs and image-based pages flow through OCR first. The same downstream pipeline sees clean text either way.

Adaptive chunking engine

Paragraph-based for legal contracts, word-count for technical manuals, page-based for academic papers. One strategy across every document type would lose meaning at the boundaries.

Vector store (pgvector)

Embeddings land in pgvector for fast cosine-similarity retrieval. Postgres is the same database the rest of the app uses — one less moving part to operate.

Multi-model RAG pipeline

GPT-4 for legal precision, LLaMA 3 for technical manuals, Gemini Pro for academic synthesis — routed through a common interface so the pipeline picks the right model per query.

Citation engine (page-level)

Every answer carries a citation back to the source page. Legal, compliance, and research workflows can verify each passage instead of trusting a summary.

Document structure analyser

A pre-processing analyser inspects each document, identifies its type, and picks the chunking strategy — instead of forcing one strategy across the whole corpus.

What we built

AI document search that returns the exact passage with a citation back to the source page — not a summarised guess. Works across PDFs, books, scanned documents, and long technical reports.

How we built it

Documents are parsed (OCR if scanned), analysed for structure, and chunked using an adaptive strategy chosen per document. Embeddings land in a vector database; queries route through a multi-model RAG pipeline that picks the best LLM for the document type. Every answer carries a page-level citation back to the source.

Users upload PDFs, books, or scanned documents. The system inspects each document and picks the right chunking strategy automatically — paragraph-based for prose, page-based for legal contracts, word-count for technical manuals — instead of forcing one strategy on every document. Embeddings land in a vector store; when a user asks a question, retrieval finds the relevant chunks, and a multi-model layer routes the query to GPT-4, LLaMA, or Gemini depending on the document type. Answers include a citation pointing back to the exact source page, so legal and research workflows can verify every claim.

Architecture

How a request flows through it

Each request enters at the top of the diagram, flows through every box, and lands at the bottom — exactly the way the production system behaves. The scan-line traces where a live request would be right now.

ai-document-search/architecture.txttracing request flow

Document upload (PDF / DOCX / image)

│

▼

OCR + PDF parser

│

▼

┌─────────────────────────────┐

│ Adaptive chunking analyser │

│ (paragraph / word / page) │

└──────────┬──────────────────┘

▼

Embeddings ─► PostgreSQL + pgvector

│

▼

┌─────────────────────────────┐

│ Query routed via LangChain │

│ ├─ OpenAI GPT-4 │

│ ├─ LLaMA │

│ └─ Gemini 2 │

└──────────┬──────────────────┘

▼

Answer + citation back to source page

▼ flow direction┌─┐ componentai-document-search.flow · live

Stack

What it's built with

Capabilities

Multi-Model RAG Pipeline (GPT-4 · LLaMA · Gemini)Adaptive Chunking EngineVector Store (pgvector)OCR + PDF ParserCitation Engine (page-level)Document Structure AnalyserSemantic Search Layer

Engineering notes

The interesting parts

Adaptive chunking per document

A pre-processing analyser picks paragraph-, word-, or page-based chunking per document — instead of forcing one strategy across legal contracts, technical books, and corporate reports.

Multi-model behind a common interface

GPT-4, LLaMA, and Gemini routed through a common interface so the pipeline picks the best model per query without app-level changes.

Citation back to source page

Every answer carries a page-level citation back to the original document — what makes the tool usable in legal, compliance, and academic workflows where a passage without a reference is opinion.

OCR for scanned documents

Scanned PDFs and images flow through OCR before chunking, so the same search experience works on born-digital and scanned content alike.

Decisions

The calls that did most of the work

A handful of engineering choices shape how a system feels. Here are the ones we'd still defend — alongside what each one cost.

Adaptive chunking per document

Legal contracts, technical books, and corporate reports each have a different 'natural unit' for a query to land on — one fixed chunk size lands badly on at least one of them.

Tradeoff: A pre-processing analyser adds latency before the first query can run on a new document.

Multi-model behind a common interface

Different LLMs perform differently on legal vs technical vs academic text. A common interface lets the system pick per query without app-level changes.

Tradeoff: Three model contracts to test, three rate-limit budgets to manage, three places to chase up regressions.

Citation back to source page, not just text

A passage without a page reference is opinion; citing the exact source page is what makes the tool usable in legal and research workflows.

Tradeoff: The chunking and embedding layer has to carry page metadata through every transformation.

Want something like this?

Tell us what you're building.

Free 30-minute call. Real humans, real timelines, no follow-up emails forever.

See more capabilities