Concepts

Here are the core concepts for working with the Vertesia Platform.

Large Language Models (LLM)

Generative AI is based on the capability of interpreting human languages - and reply to questions using a human language.

There are key points to know about them:

  • Large Language Models (LLMs) are capable of interpreting human languages (including programming languages for instance).
  • Human language is found in unstructured (image, video, audio) and structured (document,database record) contents.
  • Models are pre-trained on very large sets of contents (public, private, hybrid).
  • Models can be fine tuned to further adapt them, a posteriori, to specific knowledge (e.g. Enterprise contents such as suppliers contracts, corporate policies) - but that’s not necessarily a good approach.
  • Retrieval Augmented Generation (RAG) is a good alternative to fine tuning for adding knowledge to models.
  • Though chatBots have been the first well known application of generative AI, they fall far behind the full potential of Gen AI.
  • LLMs can be seen as processing entities that can be requested in an automated way - and enable a very wide range of use cases.

A bit of jargon

TermDescription
Context WindowImagine you have a really big notebook that you use to take notes while you are watching a movie. The notebook can only hold so many pages at a time, and once it is full, you can only look at what is written on those pages. In the world of large language models (LLMs), the notebook is called the context window. It is the amount of text (or tokens) that the model can “remember” or consider at one time when it is answering a question or having a conversation. Each model has a distinct context window.
EmbeddingsEmbeddings are high-dimensional vectors that represent tokens (words)c in a way that captures their semantic meaning and relationships. These vectors are learned during the training of the LLM and are crucial for the model's ability to understand and generate language.
Max tokensMaximum number of tokens (words/characters) for the model to generate in the output. In some models, it is taken on the context window length.
SimilarityIf two tokens (words) are very similar in meaning, like "happy" and "joyful," their numerical representation will be next to each other. If the words are very different, like "happy" and "fast," their numbers will be farther apart. This numerical way of arranging words is what we call an embedding.
TokenA unit of text that the model processes. Tokens can be words, subwords, characters, or even punctuation marks. The process of breaking down text into these smaller units is known as tokenization. This allows the model to handle and generate text more efficiently by working with manageable pieces of information.

Prompt Templates

Prompt Templates are the building blocks of prompts and are used to create prompts. Prompts are then assembled to define a prompt for a task (Interaction).

  • Name
    JS Template
    Description

    It's a Javascript Template engine, running in a jailed environment. You can use standard javascript string replacement syntax (${var}), as well as control blocks (for, if, else, etc.), and array functions map, reduce, filter, etc. It needs to retur a string.

  • Name
    Plain Text
    Description

    Simple plain text format, with no variable replacement. Useful for application context or safety prompts.

Interactions

Interactions define the tasks the LLM are requested to perform.

An interaction is defined by the following main components:

  • Name
    Name
    Description
    The name of the interaction.
  • Name
    Description
    Description
    A description of the interaction.
  • Name
    Prompt Segments
    Description

    A list of prompts templates to be rendered as part of the final prompt.

  • Name
    Schema
    Description

    JSON Schema requested from the generative model for the response. It will be used to validate the response as well.

  • Name
    Configuration
    Description

    Environment and Model to execute the interaction on, and execution parameters.

Runs

Runs are the execution of an interaction, it is both the request to and the response from the generative model.

Runs have the following statuses:

  • Name
    created
    Description

    The run has been created, but not yet started. Typically the case when waiting for the streaming start from the client.

  • Name
    processing
    Description
    The run is currently executing.
  • Name
    completed
    Description
    The run has completed successfully.
  • Name
    failed
    Description

    The run has failed. The failure reason is in the field error.

Environments

Environments connect to LLM inference providers which are the execution platforms running generative models.

We currently support environments for the following inference providers:

In addition to the core inference providers above, we have created virtual providers to assemble models and platform into a virtual, synthetic LLM, and offer several balancing and execution strategies:

  • virtual_lb - a synthetic environment that allows load balancing and failover between multiple models
  • virtual_mediator - a synthetic environment that allows multi-head execution and LLM mediation

Was this page helpful?