VectifyAI Launches Mafin 2.5 and PageIndex: Achieving 98.7% Financial RAG Accuracy with a New Open-Source Vectorless Tree Indexing.

Editor
5 Min Read


Building a Retrieval-Augmented Generation (RAG) pipeline is easy; building one that doesn’t hallucinate during a 10-K audit is nearly impossible. For devs in the financial sector, the ‘standard’ vector-based RAG approach—chunking text and hoping for the best—often results in a ‘text soup’ that loses the vital structural context of tables and balance sheets.

VectifyAI is attempting to close this gap with the launch of Mafin 2.5, a multimodal financial agent, and PageIndex, an open-source framework that shifts the industry toward ‘Vectorless RAG.’

The Problem: Why Vector RAG Fails Finance

Traditional RAG relies on semantic similarity. If you ask about ‘Net Income,’ a vector database looks for chunks of text that sound like net income. However, financial documents are layout-dependent. A number in a cell is meaningless without its header, and those headers are often stripped away during traditional PDF-to-text conversion.

This is the ‘garbage in, garbage out’ trap: even the smartest LLM cannot reason correctly if the input data has lost its hierarchical structure.

Mafin 2.5: Accuracy at Scale

Mafin 2.5 isn’t just a fine-tuned model; it’s a reasoning engine that achieved 98.7% accuracy on FinanceBench, significantly outperforming GPT-4o and Perplexity in financial retrieval tasks.

What sets it apart for devs is its native integration with high-fidelity data sources:

  • Comprehensive SEC Access: Direct indexing of 10-K, 10-Q, and 8-K filings.
  • Earnings Intel: Real-time and historical earnings call transcripts.
  • Market Data: Live tickers across the Russell 3000 and Nasdaq.
https://pageindex.ai/blog/Mafin2.5

PageIndex: The Move to ‘Vectorless’ RAG

The ‘secret sauce’ behind Mafin 2.5’s precision is PageIndex. PageIndex replaces traditional flat embeddings with a hierarchical tree index.

Instead of searching through random chunks, PageIndex allows an LLM to ‘reason’ through a document’s structure. It builds a semantic tree—essentially an intelligent map of the document—enabling the agent to identify the exact section, page, and line item required.

Key technical features include:

  • Vision-Native Support: PageIndex supports Vision-based RAG, allowing models to ‘see’ the global layout of a page (charts, complex grids) rather than relying solely on OCR text.
  • Hierarchical Navigation: It transforms PDFs into a navigable tree structure, ensuring the relationship between headers and data remains intact.
  • Traceability: Unlike the ‘black box’ of vector similarity, every answer has a clear path through the document tree, providing a much-needed audit trail for regulated financial environments.

Key Takeaways

  • Unprecedented Financial Accuracy (98.7%): Mafin 2.5 has set a new state-of-the-art record on the FinanceBench benchmark, achieving 98.7% accuracy. This significantly outperforms general-purpose models like GPT-4o (~31%) and Perplexity (~45%) by focusing on specialized financial reasoning rather than general retrieval.
  • The Shift to ‘Vectorless RAG’: Moving away from the “vibe-based” search of traditional vector databases, PageIndex introduces Reasoning-based RAG. It uses an LLM to ‘reason’ its way through a document’s structure, mimicking how a human analyst navigates a report to find specific data points.
  • Hierarchical ‘Tree’ Indexing vs. Chunking: Instead of chopping documents into arbitrary, contextless text chunks, PageIndex organizes PDFs into a semantic tree structure (an intelligent Table of Contents). This preserves the critical relationship between headers, nested tables, and footnotes that traditional RAG often destroys.
  • Vision-Native & OCR-Free Workflows: The framework supports Vision-based Vectorless RAG, allowing the AI to ‘see’ and retrieve information directly from page images. This is a game-changer for financial documents where the visual layout of a balance sheet or complex grid is as important as the numbers themselves.
  • Enterprise-Grade Traceability: Unlike the ‘black box’ of vector similarity, PageIndex provides a fully auditable reasoning path. Every response is linked to specific nodes, pages, and sections, providing the transparency required for high-stakes financial audits and compliance.

Check out the Technical details and RepoAlso, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.


Michal Sutter is a data science professional with a Master of Science in Data Science from the University of Padova. With a solid foundation in statistical analysis, machine learning, and data engineering, Michal excels at transforming complex datasets into actionable insights.

Share this Article
Please enter CoinGecko Free Api Key to get this plugin works.