Concepts

Here are the core concepts for working with the Vertesia Platform.

Large Language Models (LLM)

Generative AI is based on the capability of interpreting human languages - and reply to questions using a human language.

There are key points to know about them:

Large Language Models (LLMs) are capable of interpreting human languages (including programming languages for instance).
Human language is found in unstructured (image, video, audio) and structured (document,database record) contents.
Models are pre-trained on very large sets of contents (public, private, hybrid).
Models can be fine tuned to further adapt them, a posteriori, to specific knowledge (e.g. Enterprise contents such as suppliers contracts, corporate policies) - but that’s not necessarily a good approach.
Retrieval Augmented Generation (RAG) is a good alternative to fine tuning for adding knowledge to models.
Though chatBots have been the first well known application of generative AI, they fall far behind the full potential of Gen AI.
LLMs can be seen as processing entities that can be requested in an automated way - and enable a very wide range of use cases.

A bit of jargon

Term	Description
Context Window	Imagine you have a really big notebook that you use to take notes while you are watching a movie. The notebook can only hold so many pages at a time, and once it is full, you can only look at what is written on those pages. In the world of large language models (LLMs), the notebook is called the context window. It is the amount of text (or tokens) that the model can “remember” or consider at one time when it is answering a question or having a conversation. Each model has a distinct context window.
Embeddings	Embeddings are high-dimensional vectors that represent tokens (words) in a way that captures their semantic meaning and relationships. These vectors are learned during the training of the LLM and are crucial for the model's ability to understand and generate language.
Max tokens	Maximum number of tokens (words/characters) for the model to generate in the output. In some models, it is taken on the context window length.
Similarity	If two tokens (words) are very similar in meaning, like "happy" and "joyful," their numerical representation will be next to each other. If the words are very different, like "happy" and "fast," their numbers will be farther apart. This numerical way of arranging words is what we call an embedding.
Token	A unit of text that the model processes. Tokens can be words, subwords, characters, or even punctuation marks. The process of breaking down text into these smaller units is known as tokenization. This allows the model to handle and generate text more efficiently by working with manageable pieces of information.

Prompt Templates

Prompt Templates are the building blocks of prompts and are used to create prompts. Prompts are then assembled to define a prompt for a task (Interaction).

Name
JS Template
Description
It's a Javascript Template engine, running in a jailed environment. You can use standard javascript string replacement syntax (${var}), as well as control blocks (for, if, else, etc.), and array functions map, reduce, filter, etc. It needs to retur a string.
Name
Plain Text
Description
Simple plain text format, with no variable replacement. Useful for application context or safety prompts.

Interactions

Interactions define the tasks the LLM are requested to perform.

An interaction is defined by the following main components:

Name
Name
Description
The name of the interaction.
Name
Description
Description
A description of the interaction.
Name
Prompt Segments
Description
A list of prompts templates to be rendered as part of the final prompt.
Name
Schema
Description
JSON Schema requested from the generative model for the response. It will be used to validate the response as well.
Name
Configuration
Description
Environment and Model to execute the interaction on, and execution parameters.

Runs

Runs are the execution of an interaction, it is both the request to and the response from the generative model.

Runs have the following statuses:

Name
created
Description
The run has been created, but not yet started. Typically the case when waiting for the streaming start from the client.
Name
processing
Description
The run is currently executing.
Name
completed
Description
The run has completed successfully.
Name
failed
Description
The run has failed. The failure reason is in the field error.

Environments

Environments connect to LLM inference providers which are the execution platforms running generative models.

We currently support environments for the following inference providers:

azure_openai - Azure OpenAI Service
bedrock - Amazon Bedrock
groq - Groq
huggingface_ie - Hugging Face's Inference Endpoint
mistralai - Mistral AI's La Platforme
openai - OpenAI
replicate - Replicate
togetherai - TogetherAI
vertexai - Google's Vertex AI
watsonx - IBM's watsonx.ai

In addition to the core inference providers above, we have created virtual providers to assemble models and platform into a virtual, synthetic LLM, and offer several balancing and execution strategies:

virtual_lb - a synthetic environment that allows load balancing and failover between multiple models
virtual_mediator - a synthetic environment that allows multi-head execution and LLM mediation