Content Indexing Overview
Vertesia provides powerful content indexing and search capabilities that enable AI agents and applications to efficiently find and retrieve relevant documents from your knowledge base.
What is Content Indexing?
Content indexing is the process of analyzing, embedding, and organizing your documents to make them searchable. Vertesia automatically indexes content when documents are created or updated, maintaining search indexes that support multiple search strategies.
When you upload documents to Vertesia, the platform:
- Extracts content from various file formats (PDF, Word, images, etc.)
- Generates embeddings using AI models to capture semantic meaning
- Indexes metadata including document properties, types, and relationships
- Maintains search indexes for fast retrieval
Search Infrastructure
Vertesia supports a dual search infrastructure to meet different scalability and feature requirements:
MongoDB Atlas Vector Search (Default)
MongoDB Atlas Vector Search is the default search backend, always available for all projects. It provides:
- Vector search using document embeddings for semantic similarity
- Automatic index management with status monitoring
- Tight integration with the document store
- Zero configuration required to get started
This is the recommended option for most projects and provides excellent search quality out of the box.
Elasticsearch (Optional)
Elasticsearch is available as an optional backend for projects requiring advanced search capabilities:
- Full-text search with advanced text analysis, stemming, and fuzzy matching
- Complex aggregations for analytics and faceted navigation
- DSL queries for complete control over search behavior
- High-volume indexing with zero-downtime reindexing
- Hybrid search combining full-text and vector search with configurable weights
Elasticsearch is ideal for enterprise deployments with large document volumes or requirements for advanced search analytics.
Search Types
Vertesia supports multiple search strategies that can be used individually or combined:
Semantic Search (Vector)
Semantic search uses embeddings to find documents based on meaning rather than exact keywords. When you search for "quarterly financial report," semantic search can find documents about "Q3 earnings summary" even without matching words.
Best for:
- Finding conceptually similar documents
- Natural language queries
- Cross-language search (when embeddings support it)
Full-Text Search
Full-text search matches documents based on keywords with support for:
- Stemming: Matching "running" with "run," "runs," "ran"
- Fuzzy matching: Finding documents despite typos
- Phrase matching: Exact phrase requirements
- Field-specific search: Targeting specific metadata fields
Best for:
- Known keyword searches
- Exact phrase matching
- Technical terminology
Hybrid Search
Hybrid search combines semantic and full-text search for the best of both approaches. Vertesia supports multiple score aggregation methods:
| Method | Description |
|---|---|
| RRF (Reciprocal Rank Fusion) | Combines result rankings, good when relevance scores aren't comparable |
| RSF (Relevance Score Fusion) | Combines normalized relevance scores for direct score comparison |
| Smart | Automatically selects the best method based on available search types |
You can also configure weights to prioritize one search type over another:
{
"full_text": "quarterly report",
"vector": { "text": "financial analysis" },
"weights": { "full_text": 2, "vector": 3 }
}
When to Use Each Backend
| Requirement | Recommended Backend |
|---|---|
| Quick setup, standard search | MongoDB Atlas (default) |
| Semantic search only | MongoDB Atlas (default) |
| Advanced full-text search | Elasticsearch |
| Complex aggregations/analytics | Elasticsearch |
| Hybrid search with weights | Elasticsearch |
| High-volume document sets | Elasticsearch |
| DSL query access | Elasticsearch |
Getting Started
- Configure embeddings to enable semantic search. See Embeddings Configuration.
- Optionally enable Elasticsearch for advanced features. See Search Configuration.
- Use the query_documents tool to search from agents. See Built-in Tools.
Next Steps
- Embeddings Configuration - Configure text, image, and properties embeddings
- Search Configuration - Set up MongoDB Atlas and Elasticsearch backends
- Built-in Tools - Learn about the query_documents tool
