Content Indexing Overview

Vertesia provides powerful content indexing and search capabilities that enable AI agents and applications to efficiently find and retrieve relevant documents from your knowledge base.

What is Content Indexing?

Content indexing is the process of analyzing, embedding, and organizing your documents to make them searchable. Vertesia automatically indexes content when documents are created or updated, maintaining search indexes that support multiple search strategies.

When you upload documents to Vertesia, the platform:

Extracts content from various file formats (PDF, Word, images, etc.)
Generates embeddings using AI models to capture semantic meaning
Indexes metadata including document properties, types, and relationships
Maintains search indexes for fast retrieval

Search Infrastructure

Vertesia supports a dual search infrastructure to meet different scalability and feature requirements:

MongoDB Atlas Vector Search (Default)

MongoDB Atlas Vector Search is the default search backend, always available for all projects. It provides:

Vector search using document embeddings for semantic similarity
Automatic index management with status monitoring
Tight integration with the document store
Zero configuration required to get started

This is the recommended option for most projects and provides excellent search quality out of the box.

Elasticsearch (Optional)

Elasticsearch is available as an optional backend for projects requiring advanced search capabilities:

Full-text search with advanced text analysis, stemming, and fuzzy matching
Complex aggregations for analytics and faceted navigation
DSL queries for complete control over search behavior
High-volume indexing with zero-downtime reindexing
Hybrid search combining full-text and vector search with configurable weights

Elasticsearch is ideal for enterprise deployments with large document volumes or requirements for advanced search analytics.

Search Types

Vertesia supports multiple search strategies that can be used individually or combined:

Semantic Search (Vector)

Semantic search uses embeddings to find documents based on meaning rather than exact keywords. When you search for "quarterly financial report," semantic search can find documents about "Q3 earnings summary" even without matching words.

Best for:

Finding conceptually similar documents
Natural language queries
Cross-language search (when embeddings support it)

Full-Text Search

Full-text search matches documents based on keywords with support for:

Stemming: Matching "running" with "run," "runs," "ran"
Fuzzy matching: Finding documents despite typos
Phrase matching: Exact phrase requirements
Field-specific search: Targeting specific metadata fields

Best for:

Known keyword searches
Exact phrase matching
Technical terminology

Hybrid Search

Hybrid search combines semantic and full-text search for the best of both approaches. Vertesia supports multiple score aggregation methods:

Method	Description
RRF (Reciprocal Rank Fusion)	Combines result rankings, good when relevance scores aren't comparable
RSF (Relevance Score Fusion)	Combines normalized relevance scores for direct score comparison
Smart	Automatically selects the best method based on available search types

You can also configure weights to prioritize one search type over another:

{
  "full_text": "quarterly report",
  "vector": { "text": "financial analysis" },
  "weights": { "full_text": 2, "vector": 3 }
}

When to Use Each Backend

Requirement	Recommended Backend
Quick setup, standard search	MongoDB Atlas (default)
Semantic search only	MongoDB Atlas (default)
Advanced full-text search	Elasticsearch
Complex aggregations/analytics	Elasticsearch
Hybrid search with weights	Elasticsearch
High-volume document sets	Elasticsearch
DSL query access	Elasticsearch

Getting Started

Configure embeddings to enable semantic search. See Embeddings Configuration.
Optionally enable Elasticsearch for advanced features. See Search Configuration.
Use the query_documents tool to search from agents. See Built-in Tools.

Next Steps

Embeddings Configuration - Configure text, image, and properties embeddings
Search Configuration - Set up MongoDB Atlas and Elasticsearch backends
Built-in Tools - Learn about the query_documents tool