Proxy-Pointer RAG: Achieving Vectorless Accuracy at Vector RAG Scale and Cost

Editor
34 Min Read


launch of PageIndex recently, is part of a broader shift in AI architecture toward “Vectorless RAG” or “Reasoning-Based Retrieval.” Instead of the standard method of splitting documents into random chunks and searching via mathematical similarity, PageIndex builds a “Smart Table of Contents” (a hierarchical tree) that allows LLMs to navigate documents like a human expert would. Numerous blogs (including this one from Microsoft), outline the working principles (no vector database, no chunking, enhanced explainability) along with 98.7% accuracy achieved on a financial benchmark. However, they are also careful to note that Vectorless RAG is best suited for deep-dive queries on complex structured or semi-structured documents (such as financial statements), than searching across many independent documents, such as customer support knowledge bases, where we should continue to use Vector RAG.

Why is that?

If Vectorless RAG using PageIndex provides better (or at least as good) results on almost any query, why not use it for a large collection of documents. The primary reason is that PageIndex’s tree-based approach cannot practically scale to multi-document scenarios. The hierarchical tree index that is a pre-requisite ingestion step is slow and expensive to build using a LLM. Furthermore, the retrieval is a 2 step process: use an LLM to walk the tree, locate the most relevant nodes, then use the content of those nodes as context for the response synthesis step using the LLM.

In comparison, building a vector index is fast and inexpensive, and the retrieval step uses a LLM only once during the synthesis step. Also, Ingestion using an embedding model costs much less than summarization of the full document by an LLM.

What if you could get the excellent structure-aware reasoning accuracy of Vectorless RAG, along with the low latency and price of a Vector RAG, in a way that is scalable across the enterprise database? In this article, I’ll walk through a real use case on a large, complex document to build Proxy-Pointer RAG—an ingestion and retrieval pipeline that achieves this through a set of novel engineering steps. Along the way, we will explore and demonstrate the following:

  • Why exactly is PageIndex so accurate? And why it is difficult to practically scale the concept to multi-document knowledge bases.
  • A quick comparison of Vectorless RAG using PageIndex vs Flat Vector RAG to establish a baseline.
  • How can we incorporate the principles of PageIndex into a Vector index with none of the associated latency and cost?
  • Comparison of a wide variety of queries using PageIndex and Proxy-Pointer to test the quality of retrievals.

Use Case Setup

We will use a World Bank report named South Asia Development Update, April 2024: Jobs for Resilience (License: CC BY 3.0 IGO). This is a 131 page report comprising multiple chapters, complex charts, tables, content in boxes etc. and is a good candidate for PageIndex to prove its capability. I have used gemini-3-flash as the LLM to build the pageindex tree and gemini-3.1-flash-lite for retrievals. I extracted the report pdf to a markdown file using the Adobe PDF Extract API, but any other method such as using a VLM which preserves the integrity of the tables, charts etc would work just as well. For vector database, FAISS is used.

How does PageIndex work?

Instead of the “chunk your document, embed the chunks, retrieve the top-K, feed them to an LLM” pipeline of a Vector RAG, PageIndex takes a radically different approach to document retrieval. Instead of treating a document as a flat sequence of chunks, it builds a semantic skeleton tree — a hierarchical map of every section, sub-section, and content block in the document — and then uses an LLM to navigate that tree at query time.

Phase 1: Indexing(once per document)

PageIndex parses the document’s heading structure (Markdown headers, PDF outlines, etc.) into a nested tree. Each node gets:

  • title (extracted from the heading)
  • node ID (unique identifier like 0012)
  • Line boundaries (start and end line in the source document)
  • summary (generated by an LLM — this is the expensive and time-consuming part)

The result is a JSON that looks like this:

{
  "node_id": "0011",
  "title": "Chapter 1. Deceptive Strength",
  "summary": "Covers South Asia's growth outlook, inflation trends, financial vulnerabilities, climate risks, and policy challenges...",
  "line_num": 621,
  "nodes": [
    {
      "node_id": "0012",
      "title": "Introduction",
      "summary": "Summarizes the chapter's key themes including regional growth driven by India...",
      "line_num": 625
    },
    ...
  ]
}

Phase 2: Retrieval (Per Query)

When a user asks a question, PageIndex hands the entire tree of summaries to an LLM and says, “Which nodes contain the answer?”. This is unlike a Vector RAG which relies on mathematical similarity between query and chunk embeddings to build the relevant context.

The LLM reads the summaries — not the full text — and returns a short list of node IDs. PageIndex then uses the line boundaries to slice the exact, contiguous, full section from the original markdown file and passes it to the synthesis LLM.

Why this works so well?

PageIndex excels because of three architectural advantages:

1. Structural Navigation, Not Pattern Matching

When we ask “What are the main messages of Chapter 1?”, PageIndex doesn’t search for chunks containing those words. It reads the summary of node 0011 (“Chapter 1. Deceptive Strength”) which says “Covers growth outlook, inflation, financial vulnerabilities, climate risks, and policy challenges” — and immediately knows this is the right node. It reasons about relevance, not semantic and lexical similarity.

2. Contiguous Context Extraction

Once the right nodes are identified, PageIndex extracts the full, unbroken section that the node represents, from the original Markdown — headers, sub-headers, bullet points, figure references, and all. The synthesis LLM receives context that reads like a properly authored document section, not a fragmented chunk with arbitrary boundaries.

3. Zero Chunk Boundary Artifacts

There are no overlapping chunks, no split sentences, no context windows that start mid-paragraph. Every piece of context has a natural beginning (the section header) and a natural end (the next section’s start). This dramatically reduces hallucination from ambiguous context.

However, this strength comes at a cost—and this is where the approach starts to break down.

Why this is difficult to scale?

The reasons are Cost and Latency (ingestion as well as retrieval time). For our 131-page report, we get 137 structural nodes in the tree and therefore, PageIndex needs 137 LLM calls just for the indexing phase — one summary per node. For a corpus of 50 documents, that’s ~7,000 LLM calls before we have used it to answer a single question.

Here is a comparative cost of PageIndex vs Vector Index:

Metric PageIndex Standard Vector RAG
Indexing LLM calls ~137 per document 0
Indexing time 5-10 minutes per document 30 seconds per document
Indexing cost $$$ (LLM API fees) $ (embedding API only)
Multi-document support Requires per-document tree traversal during retrieval Single unified index, fast and efficient
Retrieval quality ★★★★★ ★★★☆☆

For a single critical document such as a financial report, this cost is easily justified. For a 500-document enterprise knowledge base on customer service SOPs that updates weekly? It’s a non-starter.

This is where PageIndex starts becoming impractical for real systems at scale.

Before proposing an alternative, let’s establish a baseline: how much better is PageIndex compared to standard Vector RAG?

Comparison of Vectorless vs Flat Vector RAG

To establish a baseline, I compared the results of a diverse range of queries using Vectorless and flat Vector RAG. As expected, the quality of Vectorless RAG was better in all cases. Here are two examples:

Query about table content

Here is the response from Vectorless RAG:

=== VECTORLESS RAG INTERACTION ===
Question: what does annexure table a2.1.1 say
Nodes Retrieved: 0125, 0131
Response:
Annex Table A2.1.1, titled "Studies included in the literature review," provides a comprehensive list of the academic and policy papers utilized to estimate the growth elasticity of employment (Annex Figure A2.1.1; Annex 2.1).

The table catalogs 25 specific studies, detailing the following information for each:
*   **Paper:** The author(s) and publication source.
*   **Period of main sample:** The timeframe covered by the study (ranging from the 1960s through 2019).
*   **Sample coverage:** The geographical or organizational scope (e.g., Worldwide, Europe, South Asia, Africa, Asia).
*   **Region:** The specific region analyzed.
*   **Economies:** The specific countries included in the study sample.

According to the provided text, the articles listed in this table were selected based on specific criteria: they focus on growth elasticity of employment using country-level estimates. The selection excludes articles that present exclusively subnational, gender-specific, or industry/sector-specific estimates, as well as those focused solely on Okun's law and unemployment elasticity of growth (Annex 2.1).

And the same using Vector RAG, which completely misses the content

=== VECTOR RAG INTERACTION ===
Question: what does annexure table a2.1.1 say
Retrieved 5 chunks.
Response:
Based on the provided context, there is no information available describing the contents of a table labeled "A2.1.1."

The documents list "A2.1.1" in the table of contents as **"Studies included in the literature review"** (page 93). However, the detailed content provided in your search results pertains to other tables, specifically:

*   **Annex 2.1:** Describes a summary of the literature on the growth elasticity of employment, noting that "an extensive literature search was performed of studies on the elasticity of employment with respect to output growth."
*   **Annex Figure A2.1.1 (referenced as A2..1.11.1 in the text):** Titled "Estimates of output elasticity of employment," which states, "The literature provides a wide range of estimates on the growth elasticity of total employment."
*   **Annex Table A2.3.3:** Provides specific regression data regarding the robustness tests for the female employment ratio.

There is no specific data or content provided for Annex Table A2.1.1 beyond its title in the table of contents.

Query requiring Structural knowledge of the document

The query is what are the questions answered by chapter 2? Here is the response from Vectorless RAG. It precisely located Chapter 2 and the section of “Questions” within that and published it verbatim, along with the grounding information for explainability.

=== VECTORLESS RAG INTERACTION ===
Question: what are the questions answered by chapter 2
Nodes Retrieved: 0098
Response:
Based on the provided excerpts, Chapter 2 addresses the following questions:

*   **How does South Asia’s employment compare with that of other emerging market and developing economies (EMDEs)?**
*   **What are the correlates of long-run employment ratios among EMDEs?**
*   **How can South Asia create the jobs needed to absorb its growing working-age population, while also boosting productivity?**

These questions are explicitly listed in the section titled **"Questions."**

The chunks in Vector RAG do not maintain the structural hierarchy and section integrity, so while it locates the “Questions” section, it is not able to find the specific questions in it.

=== VECTOR RAG INTERACTION ===
Question: what are the questions answered by chapter 2
Retrieved 5 chunks.
Response:
Based on the provided document, Chapter 2—titled "Jobless Development"—is structured to address specific inquiries regarding labor markets and economic growth.

The chapter explicitly identifies a "Questions" section (page 65) to frame its analysis. While the document does not list the specific text of these questions, it indicates that the chapter covers the following thematic areas and analytical components:

*   **Trends and Productivity:** The chapter examines "Trends in employment and labor productivity" (page 67) and addresses "Labor productivity growth" (page 73).
*   **Employment Ratios:** A significant portion of the chapter is dedicated to "Long-run employment ratios" (page 75) and their various "Correlates," including:
    *   **Policy and Structural Factors:** Tax regimes, policy uncertainty (page 77), and trade openness (page 79).
    *   **Market and Financial Factors:** Access to finance (page 81), as well as product, labor, and land markets (page 82).
    *   **Human Capital and Firm Characteristics:** Education (page 83) and establishment size (page 78).
*   **Growth Dynamics:** The chapter explores the "Engines of output growth in EMDEs" (page 73) and the "Sectoral employment composition" (page 72).
*   **Policy Implications:** The research concludes by addressing "Policy options to boost employment" (page 85).

So the question becomes: can we retain these advantages without paying the cost?

Engineering a Better Retriever — Proxy-Pointer RAG

Let’s build Proxy-Pointer RAG to answer that question. The core insight is intuitive and as follows:

You don’t need LLM summaries to give a vector database structural awareness. You just need to encode the structure into the embeddings themselves.

The system uses the same structural tree using PageIndex — but without the expensive summarization flag set. Building this skeletal tree requires no expensive LLM calls during indexing. The tree is built purely from regex-based heading detection, which runs in milliseconds.

Then, instead of asking an LLM to navigate the tree, we let FAISS do the retrieval — but we engineer the chunks so that FAISS “understands” where each chunk lives in the document’s hierarchy.

Here is a view of the Ingestion pipeline:

Ingestion Pipeline

Build a Skeleton Tree

PageIndex’s tree parser doesn’t actually need an LLM to build the structural hierarchy. The heading detection is regex-based — it finds Markdown headers (######) and builds the nesting from indentation levels. The LLM is only used to summarize each node.

We call the LLM-free version a Skeleton Tree: same structure, same node IDs, same line boundaries — but no summaries.

# Build skeleton tree — no LLM, runs in milliseconds
pageindex = PageIndex(doc_path, enable_ai=False)
tree = pageindex.build_structure()  # Pure regex parsing

The skeleton tree and the summarized tree produced for the earlier Vectorless RAG produce identical structures — same 137 nodes, same nesting depths, same line numbers, same titles. The only difference is the missing summary field.

Cost: $0. Time: < 1 second.

Structural Metadata Pointers (The Core Differentiator)

This is the heart of why PageIndex works so well — and the trick we will adopt.

In standard vector RAG, a retrieved chunk is the context. Whatever 500 words FAISS returns, that’s what the LLM sees. If the chunk starts mid-sentence or ends before the key data point, the response will miss the intent of the query entirely (as illustrated in the previous section on Vectorless vs Vector RAG comparison).

PageIndex does something fundamentally different: the chunk is never the context. Each node in the tree knows its exact position in the original document — its title, its node ID, and crucially, the start and end line numbers of the entire section it represents. When retrieval selects a node, PageIndex goes back to the original Markdown file and slices out the full, contiguous section between those line boundaries.

We replicate this exactly. Every chunk we embed into the vector index carries rich structural metadata from the tree node:

metadata = {
    "doc_id": "SADU",           # Which document
    "node_id": "0012",          # Which structural node
    "title": "Introduction",    # Section heading
    "start_line": 624,          # Where the section starts in the original file
    "end_line": 672             # Where the section ends
}

At retrieval time, we don’t feed the matched chunks to the LLM. Instead, we:

  1. Use the chunks as proxies — they’re only there to identify which nodes are relevant. Remove duplicate (doc_id, node_id) combinations to get unique top-k.
  2. Follow the metadata pointers — open the original Markdown, slice lines of nodes, e.g. 624 to 672
  3. Deliver the full sections — the LLM receives the complete, pristine, structurally-intact text

Here is a view of the retrieval pipeline:

Retrieval Pipeline

This means even if a chunk only matched on a single sentence deep inside a section, the synthesis LLM gets the entire section — with its header, its context, its figures, its conclusions. The chunk was disposable; the pointer is what matters.

This is why I call it Proxy-Pointer RAG: the vectors are proxies for location, the metadata are pointers to the real content.

Cost: $0. Impact: Transforms context quality from fragmented chunks to complete document sections.

This is key to answer the queries related to specific section of the document (such as Chapter 2). Standard vector RAG embeds raw text:

"While private investment growth has slowed in both South Asia and other EMDEs..."

FAISS has no idea this chunk comes from Chapter 1, under Economic Activity, inside Box 1.1. So when user asks “main messages of Chapter 1,” this chunk won’t rank highly — it doesn’t contain the words “Chapter 1” or “main messages.”

Breadcrumb injection prepends the full ancestry path from the Skeleton Tree to every chunk before embedding:

"[Chapter 1. Deceptive Strength > Economic activity > Regional developments > BOX 1.1 Accelerating Private Investment]
While private investment growth has slowed in both South Asia and other EMDEs..."

Now the embedding vector encodes both the content AND its structural location. When someone asks about “Chapter 1,” FAISS knows which chunks belong to Chapter 1 — because the words “Chapter 1. Deceptive Strength” are present in the embedding.

# Build breadcrumb from ancestry
current_crumb = f"{parent_breadcrumb} > {node_title}"

# Prepend to chunk text before embedding
enriched_text = f"[{current_crumb}]\n{section_text}"
chunks = text_splitter.split_text(enriched_text)

This is a zero-cost encoding of the tree structure into the vector space. We’re using the same embeddings API, the same FAISS index, the same retrieval code. The only difference is what we feed into the embedder.

Cost: $0 extra. Impact: Transforms retrieval quality for structural queries.

Structure Guided Chunking (No Blind Sliding Windows)

Standard vector RAG applies a sliding window across the entire document — a 2000-character window that moves forward with some overlap, completely oblivious to the document’s structure. A chunk might start mid-paragraph in the Introduction and end mid-sentence in a Figure caption. The boundaries are arbitrary, and every chunk is an island in itself, with no knowledge of its position in the overall document structure.

Proxy-Pointer does something fundamentally different: we walk the tree, not the text.

For each node in the skeleton tree, we extract only its own section text — from start_line to end_line — and then apply the text splitter to that isolated section. If a section is short enough, it becomes a single chunk. If it’s longer, the splitter divides it — but strictly within that section’s boundaries.

Standard RAG:  Blind sliding window across entire document
[====chunk1====][====chunk2====][====chunk3====]...
    ↑ might start in Introduction, end in Figure caption

Proxy-Pointer: Chunk within each node's boundaries
Introduction (lines 624-672)     → [chunk A] [chunk B]
Economic Activity (lines 672-676) → [chunk C]
BOX 1.1 (lines 746-749)          → skipped (< 100 chars)
Inflation (lines 938-941)        → [chunk D]

This guarantees three things:

  1. Chunks never cross section boundaries — a chunk from Introduction will never overlap with Economic Activity
  2. Each chunk belongs to exactly one node — so the node_id metadata is always precise
  3. Breadcrumbs are accurate per-chunk — they reflect the actual structural container, not a guess

Importantly, when a node is skipped (because its text is too short — e.g., a “BOX 1.1” heading with no body content), the tree walk still recurses into its children. The actual content lives in child nodes like “Introduction,” “Features,” and “Figures” — all of which get embedded with the parent’s title in their breadcrumb (eg: BOX 1.1 Accelerating Private Investment > IntroductionBOX 1.1 > Features of...). No content is ever lost; only empty structural headers are excluded.

Cost: $0. Impact: Every chunk is structurally traceable — enabling precise metadata pointers.

Noise Filtering (Removing Distractions)

Some sections in any document are irrelevant for retrievals and contaminate the context: they contain keywords that match almost every query but provide no useful content.

  • Table of Contents — mentions every chapter title (matches “Chapter 1,” “Chapter 2,” everything)
  • Executive Summary — paraphrases every key finding (matches every topic query)
  • Abbreviations — lists every acronym used in the document
  • Acknowledgments — mentions organizations, countries, and themes

These sections act as distractions in vector space. They sit near the centroid of the document’s embedding distribution, pulling retrieval toward shallow overview text instead of deep analytical content.

We apply a simple title-based filter during indexing:

NOISE_TITLES = {
    "contents", "summary of contents", "table of contents",
    "abbreviations", "acknowledgments", "foreword",
    "executive summary", "references"
}

if title.strip().lower() in NOISE_TITLES:
    continue  # Skip this node entirely

This filter removed 7 nodes from our test document. The impact was immediate: for the query “What are the main messages of Chapter 1?”, the system stopped retrieving the Executive Summary and Table of Contents and started retrieving the actual Introduction section.

Cost: $0. Impact: Eliminates the most common retrieval failure mode.

That’s it. Proxy-Pointer RAG is now ready to test.

Test Proxy-Pointer pipeline

While detailed comparison is mentioned in the next section, here are the results of the same two queries that were cited in the “Vectorless vs Flat Vector RAG” section. As you can compare, the answers are as accurate and detailed as Vectorless RAG using PageIndex.

Query about table content

=== PROXY-POINTER RAG INTERACTION ===
Question: what does annexure table a2.1.1 say
Unique Context Nodes Extracted:
  - [SADU] Node 0131: 'ANNEX TABLE A2.1.1 Studies included in the literature review' | Lines: 3952-4157
  - [SADU] Node 0092: 'ANNEX TABLE SL.B1.1.1 Marginal probability of study documenting above-average effect for poor-average households' | Lines: 2537-2745
  - [SADU] Node 0052: 'TABLE B1.1.1 Correlates of the likelihood of private investment accelerations starting' | Lines: 1663-1899

Response:
Based on the provided documents, **Annex Table A2.1.1** is titled **"Studies included in the literature review."**

This table provides a summary of various research studies, detailing the following information for each:
*   **Paper:** The author(s) and year of the study (e.g., Kapsos (2005), Saget (2000), Papola and Sahu (2012), etc.).
*   **Period of main sample:** The timeframe covered by the study (ranging from the 1960s to 2019).
*   **Sample coverage:** The scale or number of entities analyzed (e.g., "160 countries," "90 developing countries," "47 African countries").
*   **Region:** The geographic scope of the research (e.g., Worldwide, South Asia, Africa, Europe, Middle East, Asia).
*   **Economies:** The specific countries or economic groupings included in the research (e.g., "India," "Bulgaria, Czech Republic, Estonia...", "G20 countries").

The table documents a diverse set of research spanning from 1960 through 2019, covering regions and economies across the globe, including specific data on transition economies, various OECD countries, and specific regions such as South Asia and Africa.

Query requiring Structural knowledge

=== PROXY-POINTER RAG INTERACTION ===
Question: what are the questions answered by chapter 2
Unique Context Nodes Extracted:
  - [SADU] Node 0099: 'FIGURE 2.3 Growth models' | Lines: 3127-3155
  - [SADU] Node 0098: 'Questions' | Lines: 3115-3127
  - [SADU] Node 0124: 'ANNEX 2.1 Summary of literature on the growth elasticity of employment' | Lines: 3807-3812

Response:
Based on the provided text, Chapter 2 addresses the following questions regarding South Asia’s economic growth and labor market:

*   **How does South Asia’s employment compare with that of other emerging market and developing economies (EMDEs)?** (Source: *Chapter 2, Questions section*)
*   **What are the correlates of long-run employment ratios among EMDEs?** (Source: *Chapter 2, Questions section*)
*   **How can South Asia create the jobs needed to absorb its growing working-age population, while also boosting productivity?** (Source: *Chapter 2, Questions section*)

Additionally, the chapter explores these issues with the goal of understanding how the region can harness its working-age population to accelerate convergence toward the income levels of advanced economies.

While we have demonstrated the cost advantage so far, does the architecture overcome the scalability challenge of PageIndex?

Is Proxy-Pointer Scalable?

Here is why the architecture is scalable across an enterprise knowledge base. PageIndex pays a scalability penalty at both ends: ~137 LLM calls per document during indexing, and an additional LLM reasoning step per query for tree navigation. Proxy-Pointer eliminates both.

  • No LLM at indexing. The skeleton tree is regex-built in milliseconds. The only API calls are to the embedding model — identical to standard vector RAG.
  • No tree navigation at retrieval. Queries go straight to the vector index. No LLM reading summaries, no per-document traversal.

Proxy-Pointer is standard vector RAG with intelligent metadata baked in. The structural awareness lives inside the embeddings (via breadcrumbs) and the chunk metadata (via node pointers) — not in an LLM reasoning loop. It inherits all of vector RAG’s scalability: unified multi-document indexes, sub-linear search, incremental updates, and zero per-query LLM overhead beyond the final synthesis.

Fail-safe for unstructured documents: If a document has no headings — or the skeleton tree produces only a single root node — the system detects this during chunking and falls back to a standard sliding window. Chunks are flagged with empty node_id and line boundaries. At retrieval time, flagged chunks are used directly as LLM context instead of following pointers back to the source. The system gracefully degrades to standard vector RAG — no errors, no special handling required.

Let’s compare Vectorless RAG and Proxy-Pointer head-to-head.

Vectorless vs Proxy-Pointer RAG

I ran a variety of queries — broad structural, cross-reference, specific factual, figure specific etc. And let Claude judge the responses for a comprehensive comparison. You can find the detailed responses from Vectorless and Proxy-Pointer along with the full Quality Comparison report here.

The following table encapsulates the verdict. The Final Score: PageIndex 2 — Proxy 4 — Ties 4 . In other words, Proxy-Pointer matches or beats PageIndex on 8 out of 10 queries. And all at the scalability and price of a Flat Vector RAG.

Here is the summary verdict:

# Query Type Winner
1 Broad structural (Ch.1 messages) 🔴 PageIndex
2 Broad structural (Ch.2 messages) 🔴 PageIndex (narrow)
3 Specific factual (Box 1.1 features) 🟡 Tie
4 Cross-reference (inflation tables) 🟢 Proxy-Pointer
5 Comparative (India vs region) 🟢 Proxy-Pointer
6 Figure-specific (B1.1.1 trends) 🟢 Proxy-Pointer
7 Direct lookup (Annexure A2.1.1) 🟡 Tie
8 Entity-specific (currency crisis countries) 🟡 Tie
9 Navigational (Ch.2 questions) 🟡 Tie
10 Inferential/policy (govt vs shocks) 🟢 Proxy-Pointer

And here is the cost comparison:

Metric PageIndex Proxy-Pointer Standard Vector RAG
Indexing LLM calls ~137 per doc 0 0
Indexing time 5-10 min/doc < 30 sec/doc < 30 sec/doc
Retrieval quality ★★★★★ ★★★★★ (8/10 vs PageIndex) ★★★☆☆
Multi-doc scalability Poor (per-doc tree nav) Excellent (unified vector index) Excellent
Structural awareness Full (LLM-navigated) High (breadcrumb-encoded) None
Index rebuild on update Expensive (re-summarize) Cheap (re-embed affected nodes) Cheap
Explainability High (section titles + doc IDs) High (section titles + doc IDs) Low (opaque chunks)

Key Takeaways

  1. Structure is the missing ingredient in RAG. The quality gap between naive vector RAG and PageIndex isn’t about better embeddings — it’s about preserving hierarchy.
  2. You don’t need an LLM to encode structure. Breadcrumb injection and structural metadata give the vector index, structural awareness without any cost.
  3. Noise filtering beats better embeddings. Removing 7 low-value nodes from the index had more impact on retrieval quality than any model swap could.
  4. Pointers beat chunks. Chunks act as proxies for the full section, which is what the synthesizer LLM sees.

Conclusion

Proxy-Pointer RAG proves a simple thesis: you don’t need an expensive LLM to make a retriever structurally aware — you just need to be clever about what you embed.

Five zero-cost engineering techniques — skeleton trees, metadata pointers, breadcrumbs, structure-guided chunking, and noise filtering — close the quality gap with a full LLM-navigated system, while keeping the speed and scalability of standard vector RAG. On our 10-query benchmark, Proxy-Pointer matched or beat PageIndex on 8 out of 10 queries, at the cost of a standard Vector RAG.

The next time you’re building RAG for your structured (or unstructured) document repository, don’t reach for a bigger model. Reach for Proxy-Pointer index.

Connect with me and share your comments at www.linkedin.com/in/partha-sarkar-lets-talk-AI

Reference

World Bank. 2024. South Asia Development Update, April 2024: Jobs for Resilience —  License: CC BY 3.0 IGO.

Images used in this article are generated using Google Gemini. Code created by me.

Share this Article
Please enter CoinGecko Free Api Key to get this plugin works.