Content Indexing Overview

Vertesia provides powerful content indexing and search capabilities that enable AI agents and applications to efficiently find and retrieve relevant documents from your knowledge base.

What is Content Indexing?

Content indexing is the process of analyzing, embedding, and organizing your documents to make them searchable. Vertesia automatically indexes content when documents are created or updated, maintaining search indexes that support multiple search strategies.

When you upload documents to Vertesia, the platform:

  1. Extracts content from various file formats (PDF, Word, images, etc.)
  2. Generates embeddings using AI models to capture semantic meaning
  3. Indexes metadata including document properties, types, and relationships
  4. Maintains search indexes for fast retrieval

Search Infrastructure

Vertesia supports a dual search infrastructure to meet different scalability and feature requirements:

MongoDB Atlas Vector Search (Default)

MongoDB Atlas Vector Search is the default search backend, always available for all projects. It provides:

  • Vector search using document embeddings for semantic similarity
  • Automatic index management with status monitoring
  • Tight integration with the document store
  • Zero configuration required to get started

This is the recommended option for most projects and provides excellent search quality out of the box.

Elasticsearch (Optional)

Elasticsearch is available as an optional backend for projects requiring advanced search capabilities:

  • Full-text search with advanced text analysis, stemming, and fuzzy matching
  • Complex aggregations for analytics and faceted navigation
  • DSL queries for complete control over search behavior
  • High-volume indexing with zero-downtime reindexing
  • Hybrid search combining full-text and vector search with configurable weights

Elasticsearch is ideal for enterprise deployments with large document volumes or requirements for advanced search analytics.

Search Types

Vertesia supports multiple search strategies that can be used individually or combined:

Semantic Search (Vector)

Semantic search uses embeddings to find documents based on meaning rather than exact keywords. When you search for "quarterly financial report," semantic search can find documents about "Q3 earnings summary" even without matching words.

Best for:

  • Finding conceptually similar documents
  • Natural language queries
  • Cross-language search (when embeddings support it)

Full-Text Search

Full-text search matches documents based on keywords with support for:

  • Stemming: Matching "running" with "run," "runs," "ran"
  • Fuzzy matching: Finding documents despite typos
  • Phrase matching: Exact phrase requirements
  • Field-specific search: Targeting specific metadata fields

Best for:

  • Known keyword searches
  • Exact phrase matching
  • Technical terminology

Hybrid Search

Hybrid search combines semantic and full-text search for the best of both approaches. Vertesia supports multiple score aggregation methods:

MethodDescription
RRF (Reciprocal Rank Fusion)Combines result rankings, good when relevance scores aren't comparable
RSF (Relevance Score Fusion)Combines normalized relevance scores for direct score comparison
SmartAutomatically selects the best method based on available search types

You can also configure weights to prioritize one search type over another:

{
  "full_text": "quarterly report",
  "vector": { "text": "financial analysis" },
  "weights": { "full_text": 2, "vector": 3 }
}

When to Use Each Backend

RequirementRecommended Backend
Quick setup, standard searchMongoDB Atlas (default)
Semantic search onlyMongoDB Atlas (default)
Advanced full-text searchElasticsearch
Complex aggregations/analyticsElasticsearch
Hybrid search with weightsElasticsearch
High-volume document setsElasticsearch
DSL query accessElasticsearch

Getting Started

  1. Configure embeddings to enable semantic search. See Embeddings Configuration.
  2. Optionally enable Elasticsearch for advanced features. See Search Configuration.
  3. Use the query_documents tool to search from agents. See Built-in Tools.

Next Steps

Was this page helpful?