Configuration
First, let's go over a few concepts:
Workflow DSL: the workflow DSL is a JSON-based language that is used to define workflows. It is a simple language that is easy to learn and use. The DSL is composed of a list of steps. Each step can be either an activity or a child workflow.
Activities: Activities are the building blocks of workflows. They are the individual tasks that are executed by the workflow worker. Details about the Workflow Activities are in the Workflow Activities section.
Child Workflows: Child workflows are workflows that are executed as part of another workflow. They are useful for breaking down complex workflows into smaller, more manageable units.
Prerequisites
In order to easily create and update Workflow definitions in Vertesia, you will need to use the Vertesia CLI. If you haven't installed or configured it yet, please have a look at the documentation
Workflow Definition
A workflow definition is a JSON structures with contains at least the following:
- a name
- a description
- an array of steps
- input variables
Below is an example of intake workflow that can be triggered when a new text document is uploaded to vertesia.
{
"name": "MyWorkflow",
"description": "This is my workflow.",
"vars": {
"interactionsNames": {
"extractInformation": "sys:ExtractInformation",
"selectDocumentType": "sys:SelectDocumentType",
"generateMetadataModel": "sys:GenerateMetadataModel",
"chunkDocument": "sys:ChunkDocument"
}
},
"steps": [
{
"name": "setDocumentStatus",
"params": {
"status": "processing"
}
},
{
"title": "Extract text from the current document",
"name": "generateObjectText",
"type": "workflow",
"output": "extractResult"
},
{
"title": "Generate or assign a content type for the current document",
"name": "generateOrAssignContentType",
"import": ["interactionsNames"],
"params": {
"interactionNames": {
"generateMetadataModel": "${interactionsNames.generateMetadataModel}",
"selectDocumentType": "${interactionsNames.selectDocumentType}"
}
},
"condition": {
"extractResult.hasText": {
"$eq": true
}
}
},
{
"title": "Generate document properties from text content",
"name": "generateDocumentProperties",
"import": ["interactionsNames"],
"params": {
"interactionName": "${interactionsNames.extractInformation}"
},
"condition": {
"extractResult.hasText": {
"$eq": true
}
}
},
{
"title": "Chunk the current document text",
"name": "chunkDocument",
"import": ["interactionsNames"],
"params": {
"interactionName": "${interactionsNames.chunkDocument}",
"createParts": true
},
"condition": {
"extractResult.hasText": {
"$eq": true
}
}
},
{
"name": "generateEmbeddings",
"title": "Generate embeddings for text",
"params": {
"type": "text",
"force": false
}
},
{
"name": "setDocumentStatus",
"params": {
"status": "completed"
}
}
]
}
Workflow Variables
The DSL supports variables that can be used to store data and pass it between steps. Variables are defined in the vars
property of the workflow. The value of a variable can be a literal value or a reference to another variable. References to variables are enclosed in ${}
. For example, the following DSL defines a variable named myVariable
with the value "Hello World!":
{
"vars": {
"myVariable": "Hello World!"
}
}
The value of myVariable
can then be referenced in other parts of the DSL using ${myVariable}
. For example, the following DSL logs the value of myVariable
to the console:
{
"steps": [
{
"type": "activity",
"name": "log",
"params": {
"message": "The value of myVariable is: ${myVariable}"
}
}
]
}
Conditions
The DSL supports conditions that can be used to control the flow of the workflow. Conditions are defined in the condition
property of a step. The value of a condition is a JSON object that describes the condition. The following operators are supported:
Operator | Description |
---|---|
$eq | Equal to |
$ne | Not equal to |
$gt | Greater than |
$gte | Greater than or equal to |
$lt | Less than |
$lte | Less than or equal to |
$in | In array |
$nin | Not in array |
$regexp | Matches regular expression |
For example, the following DSL defines a step that only executes if the value of the variable myVariable
is equal to "Hello World!":
{
"steps": [
{
"type": "activity",
"name": "log",
"condition": {
"$eq": {
"myVariable": "Hello World!"
}
},
"params": {
"message": "The value of myVariable is: ${myVariable}"
}
}
]
}
Fetch
The DSL supports fetching data from external sources during the workflow execution. The fetch
property of a step is used to define the data to fetch. The value of the fetch
property is a JSON object that describes the data to fetch. The following properties are supported:
Property | Description |
---|---|
type | The type of data to fetch. |
source | The source of the data. |
query | The query to use to fetch the data. |
select | The fields to select from the fetched data. |
limit | The maximum number of results to fetch. |
on_not_found | How to handle not found objects. |
For example, the following DSL defines a step that fetches a document from the store:
{
"steps": [
{
"type": "activity",
"name": "fetchDocument",
"fetch": {
"type": "document",
"query": {
"id": "${documentId}"
}
},
"output": "document"
}
]
}
Projection
The DSL supports projecting data from the result of an activity. The projection
property of a step is used to define the data to project. The value of the projection
property is a JSON object that describes the data to project. The following operators are supported:
Operator | Description |
---|---|
$include | Include the specified fields. |
$exclude | Exclude the specified fields. |
For example, the following DSL defines a step that projects the name
and description
fields from the result of the fetchDocument
activity:
{
"steps": [
{
"type": "activity",
"name": "fetchDocument",
"fetch": {
"type": "document",
"query": {
"id": "${documentId}"
}
},
"output": "document"
},
{
"type": "activity",
"name": "projectDocument",
"params": {
"document": "${document}"
},
"projection": {
"$include": [
"name",
"description"
]
},
"output": "projectedDocument"
}
]
}