Search Configuration
Vertesia supports two search backends: MongoDB Atlas Vector Search (default) and Elasticsearch (optional). This guide covers how to configure and manage both for optimal search performance.
MongoDB Atlas Vector Search
MongoDB Atlas Vector Search is the default search backend, providing vector-based semantic search integrated directly with your document store.
How It Works
- Automatic indexing: When embeddings are enabled, documents are automatically indexed
- Vector similarity: Uses HNSW (Hierarchical Navigable Small World) algorithm for fast similarity search
- Integrated storage: Vectors stored alongside documents in MongoDB
Vector Index Management
Vector indexes are automatically created when you enable embeddings. Each embedding type has its own index:
vsearch_text- Text embeddings indexvsearch_image- Image/vision embeddings indexvsearch_properties- Properties embeddings index
Monitoring Index Status
Check vector index status through the embeddings status endpoint:
{
"vectorIndex": {
"status": "READY",
"name": "vsearch_text",
"type": "hnsw"
}
}
Status values:
| Status | Description |
|---|---|
| READY | Index is operational |
| PENDING | Index is being built |
| FAILED | Index creation failed |
| DOES_NOT_EXIST | No index (embeddings not enabled) |
Limitations
MongoDB Atlas Vector Search provides semantic/vector search only. For full-text search with stemming, fuzzy matching, or complex aggregations, enable Elasticsearch.
Elasticsearch
Elasticsearch provides advanced search capabilities including full-text search, hybrid search, and powerful aggregations.
When to Use Elasticsearch
Enable Elasticsearch when you need:
- Full-text search with stemming, fuzzy matching, and phrase queries
- Hybrid search combining vector and full-text with configurable weights
- Aggregations for analytics, facets, and document statistics
- DSL queries for complete control over search behavior
- High-volume search with dedicated search infrastructure
Prerequisites
Elasticsearch requires:
- Elasticsearch infrastructure enabled for your account
- Embeddings configured (for vector search capabilities)
- Project settings permission (
project:settings_write)
Enabling Elasticsearch
Get Status
Check the current Elasticsearch status for your project:
Get Elasticsearch Status
curl --location --request GET \
'https://api.vertesia.io/api/v1/commands/elasticsearch/status' \
--header 'Authorization: Bearer <YOUR_JWT_TOKEN>'
Example Response:
{
"infrastructureEnabled": true,
"indexingEnabled": true,
"queriesEnabled": true,
"indexStats": {
"documentCount": 15234,
"sizeInBytes": 52428800
},
"mongoDocumentCount": 15234,
"reindexProgress": null
}
Enable Indexing
Enable Elasticsearch indexing for your project. This creates the index and starts syncing documents:
Enable Elasticsearch Indexing
curl --location --request POST \
'https://api.vertesia.io/api/v1/commands/elasticsearch/enable-indexing' \
--header 'Authorization: Bearer <YOUR_JWT_TOKEN>'
Enable Queries
Enable Elasticsearch queries. Requires indexing to be enabled first:
Enable Elasticsearch Queries
curl --location --request POST \
'https://api.vertesia.io/api/v1/commands/elasticsearch/enable-queries' \
--header 'Authorization: Bearer <YOUR_JWT_TOKEN>'
Disable Queries
Disable Elasticsearch queries while keeping indexing active:
Disable Elasticsearch Queries
curl --location --request POST \
'https://api.vertesia.io/api/v1/commands/elasticsearch/disable-queries' \
--header 'Authorization: Bearer <YOUR_JWT_TOKEN>'
Disable Indexing
Disable Elasticsearch indexing entirely:
Disable Elasticsearch Indexing
curl --location --request POST \
'https://api.vertesia.io/api/v1/commands/elasticsearch/disable-indexing' \
--header 'Authorization: Bearer <YOUR_JWT_TOKEN>'
Reindexing
Reindexing rebuilds the Elasticsearch index from MongoDB. This may be needed when:
- Enabling Elasticsearch for an existing project with documents
- Changing embedding dimensions
- Index corruption or sync issues
- Recovering from failures
Trigger Reindex
Trigger Reindex
curl --location --request POST \
'https://api.vertesia.io/api/v1/commands/elasticsearch/reindex' \
--header 'Authorization: Bearer <YOUR_JWT_TOKEN>' \
--header 'Content-Type: application/json' \
--data-raw '{
"recreateIndex": false
}'
Parameters:
| Parameter | Type | Description |
|---|---|---|
recreateIndex | boolean | If true, drops and recreates the index. Use when changing dimensions or mappings. |
Zero-Downtime Reindexing
Vertesia uses alias-based reindexing for zero downtime:
- A new index is created with updated mappings
- Documents are batch-indexed to the new index
- The alias is atomically swapped from old to new
- The old index is deleted
During reindexing, queries continue to work against the existing index.
Monitoring Progress
Check reindex progress through the status endpoint:
{
"reindexProgress": {
"workflowId": "reindex-project-abc123",
"status": "running",
"processedDocuments": 5000,
"totalDocuments": 15234,
"startedAt": "2024-01-15T10:30:00Z"
}
}
Hybrid Search
Hybrid search combines full-text and vector search for optimal relevance. When both search types return results, scores are aggregated using configurable methods.
Score Aggregation Methods
| Method | Algorithm | Best For |
|---|---|---|
| RRF | Reciprocal Rank Fusion | When relevance scores from different sources aren't directly comparable |
| RSF | Relevance Score Fusion | When you want to combine normalized scores directly |
| Smart | Automatic selection | General use, automatically picks the best method |
Weight Configuration
Control the relative importance of each search type:
{
"query": {
"full_text": "quarterly report",
"vector": { "text": "financial analysis" },
"weights": {
"full_text": 2,
"vector": 3
}
}
}
Higher weights give more influence to that search type. With the above configuration, vector search results are weighted 1.5x more than full-text results.
Dynamic Scaling
When enabled, dynamic scaling adjusts weights automatically if one search type is unavailable:
{
"query": {
"full_text": "quarterly report",
"vector": { "text": "financial analysis" },
"dynamic_scaling": "on"
}
}
Index Configuration Tools
Agents can query and update index configuration using built-in tools:
get_index_configuration
Retrieves the current index status and configuration.
Returns:
- Index status (exists, healthy)
- Document count and size
- Embedding dimensions for each type
- Field mappings
update_index_configuration
Updates index configuration with options to change embedding dimensions or trigger reindexing.
Parameters:
| Parameter | Type | Description |
|---|---|---|
embedding_dimensions | object | New dimensions for text, image, or properties |
force_reindex | boolean | Trigger a full reindex |
user_confirmed | boolean | Required confirmation (must use ask_user first) |
Troubleshooting
Documents Not Appearing in Search
- Check that indexing is enabled (
indexingEnabled: true) - Verify embeddings are configured and generating
- Allow time for async indexing to complete
- Check for sync issues in status endpoint
Dimension Mismatch Errors
If you changed embedding dimensions:
- Recalculate embeddings with new dimensions
- Trigger reindex with
recreateIndex: true
Search Returns No Results
- Verify documents exist in MongoDB (
mongoDocumentCount) - Check Elasticsearch document count matches
- Test with broader queries or
match_all - Verify query syntax is correct
Reindex Stuck or Failed
- Check workflow status in the Vertesia UI
- Look for errors in workflow history
- Ensure sufficient permissions
- Try triggering a new reindex (will cancel stuck one)
Best Practices
Index Management
- Enable queries only after initial indexing completes
- Monitor document counts between MongoDB and Elasticsearch
- Schedule reindexing during low-traffic periods
Search Configuration
- Start with
smartscore aggregation - Tune weights based on search quality feedback
- Use facets for navigation and filtering
- Enable
analyzefor complex queries that benefit from LLM summarization
Performance
- Use appropriate
limitvalues (avoid fetching more than needed) - Use
count_onlyfor pagination totals - Stream large results to artifacts with
output_artifact - Consider DSL mode for complex aggregations
Next Steps
- Content Overview - Understanding the full search architecture
- Embeddings Configuration - Configure embeddings for vector search
- Built-in Tools - Learn about query_documents and index tools
- Commands API - Full API reference for Elasticsearch commands
