Semantic Layer
The /objects/:objectId/analyze
endpoint exposes a service to transform content into XML files. XML provides a richer, more structured representation than other plain text formats. This enhanced structure facilitates better processing and understanding of the document's by LLM models, particularly for large tables and complex layouts. As of Today, only PDF files are supported.
Besides improving the "understanding" of complex documents by LLM, the XML formats also enables additional capabilities:
- Deep linking. LLM responses can now include not just the exact page of a reference, but also its position. This capability can be leveraged to build awesome user experiences.
- Extracting pieces of content without LLM rewriting nor hallucination such as long tables
Analyze
Use this endpoint to trigger a content analysis for an Object
Endpoint: /objects/:objectId/analyze
Method: POST
Headers
Header | Value |
---|---|
Authorization | Bearer <YOUR_JWT_TOKEN> |
Path Parameters
Parameter | Description |
---|---|
objectId | (Required) The ID of the Object to analyze. The object mime-type must be application/pdf |
Input Parameters
Parameter | Data Type | Description |
---|---|---|
features | string[] | The features to activate for the analyzis. Currently not used. |
Example Request Payload
{
"features": [],
}
Example Response
{
"workflow_id": "workflow_execution_request:abcdef1234567890abcdef123456",
"workflow_run_id": "abcdef12-abcd-1234-abcd-1234567890ab",
"status": 1
}
Code Example
Analyze
curl -X POST \
https://zeno-server-production.api.vertesia.io/api/v1/objects/:objectId/analyze \
-H 'Authorization: Bearer <YOUR_JWT_TOKEN>' \
-H 'Content-Type: application/json' \
-d '{
"features": [],
}'
Get Status
Use this endpoint to get the status of a previously requested analysis
Endpoint: /objects/:objectId/analyze/status
Method: GET
Headers
Header | Value |
---|---|
Authorization | Bearer <YOUR_JWT_TOKEN> |
Path Parameters
Parameter | Description |
---|---|
objectId | (Required) The ID of the object to retrieve the status for. |
Example Request Payload
There is no JSON body with this request.
Example Response
{
"workflow_id": "652d77:workflow_execution_request:67ef301df9967ef50104d652",
"workflow_run_id": "0195fe53-f5db-7420-b161-9882945d2339",
"status": 1,
"progress": {
"pages": {
"total": 1,
"processed": 1,
"success": 1,
"failed": 0,
},
"tables": {
"total": 1,
"processed": 1,
"success": 1,
"failed": 0,
},
"images": {
"total": 1,
"processed": 0,
"success": 0,
"failed": 0,
},
"visuals": {
"total": 2,
"processed": 0,
"success": 0,
"failed": 0,
},
"started_at": 1743728670181,
"percent": 40,
},
}
Code Example
Get Status
curl -X GET \
https://zeno-server-production.api.vertesia.io/api/v1/objects/:objectId/analyze/status \
-H 'Authorization: Bearer <YOUR_JWT_TOKEN>'
Get Results
Use this endpoint to retreive the result of an analysis once it is completed. The response will contain the XML conversion of the object.
Endpoint: /objects/:objectId/analyze/results
Method: GET
Headers
Header | Value |
---|---|
Authorization | Bearer <YOUR_JWT_TOKEN> |
Path Parameters
Parameter | Description |
---|---|
objectId | (Required) The ID of the object to retrieve the results for. |
Example Request Payload
There is no JSON body with this request.
Example Response
{
"document": "<xml>...</xml>",
"tables": [],
"images": [],
"annotated": "https://..."
}
Code Example
Get Results
curl -X GET \
https://zeno-server-production.api.vertesia.io/api/v1/objects/:objectId/analyze/results \
-H 'Authorization: Bearer <YOUR_JWT_TOKEN>'
Get XML Text
Use this endpoint to fetch the object's corresponding XML string once the analysis is completed.
Endpoint: /objects/:objectId/analyze/xml
Method: GET
Headers
Header | Value |
---|---|
Authorization | Bearer <YOUR_JWT_TOKEN> |
Path Parameters
Parameter | Description |
---|---|
objectId | (Required) The ID of the object to retrieve the xml text for. |
Example Request Payload
There is no JSON body with this request.
Example Response
<document>...</document>
Code Example
Get XML Text
curl -X GET \
https://zeno-server-production.api.vertesia.io/api/v1/objects/:objectId/analyze/xml \
-H 'Authorization: Bearer <YOUR_JWT_TOKEN>'
Get Tables
Use this endpoint to fetch the object's table content once the analysis is completed.
Endpoint: /objects/:objectId/analyze/tables
Method: GET
Headers
Header | Value |
---|---|
Authorization | Bearer <YOUR_JWT_TOKEN> |
Path Parameters
Parameter | Description |
---|---|
objectId | (Required) The ID of the object to retrieve the tables for. |
Query Parameters
Parameter | Data Type | Description |
---|---|---|
format | 'csv' | 'json' | The format of the adapted tables to return. Defaults to json |
Example Request Payload
There is no JSON body with this request.
Example Response
[
{
"page_number": 1,
"table_number": 10,
"data": [
{
"Column 1": "Row 1, Cell 1 content",
"Column 2": "Row 1, Cell 2 content"
},
{
"Column 1": "Row 2, Cell 1 content",
"Column 2": "Row 2, Cell 2 content"
}
],
"format": "application/json"
}
]
Code Example
Get Tables
curl -X GET \
https://zeno-server-production.api.vertesia.io/api/v1/objects/:objectId/analyze/tables?format=json \
-H 'Authorization: Bearer <YOUR_JWT_TOKEN>'
Get Images
Use this endpoint to retrieve information about the images that are embedded into the source PDF file.
Endpoint: /objects/:objectId/analyze/images
Method: GET
Headers
Header | Value |
---|---|
Authorization | Bearer <YOUR_JWT_TOKEN> |
Path Parameters
Parameter | Description |
---|---|
objectId | (Required) The ID of the object to retrieve the images for. |
Example Request Payload
There is no JSON body with this request.
Example Response
[
{
"id":"55",
"page_number": 1,
"description": "This is an image.",
"width": 100,
"height": 100,
"is_meaningful": false
}
]
Code Example
Get Images
curl -X GET \
https://zeno-server-production.api.vertesia.io/api/v1/objects/:objectId/analyze/images \
-H 'Authorization: Bearer <YOUR_JWT_TOKEN>'
Get Annotated
Use this endpoint to get a rendition of the PDF file annotated with the blocks outlines and Ids
Endpoint: /objects/:objectId/analyze/annotated
Method: GET
Headers
Header | Value |
---|---|
Authorization | Bearer <YOUR_JWT_TOKEN> |
Path Parameters
Parameter | Description |
---|---|
objectId | (Required) The ID of the object to retrieve the annotated PDF for. |
Example Request Payload
There is no JSON body with this request.
Example Response
{
"url": "https://storage.googleapis.com/.../annotated.pdf"
}
Code Example
Get Annotated
curl -X GET \
https://zeno-server-production.api.vertesia.io/api/v1/objects/:objectId/analyze/annotated \
-H 'Authorization: Bearer <YOUR_JWT_TOKEN>'
Adapt Tables
Use this endpoint to transform tables contained in source the pdf files to a format of your choice. The service will identify the relevant tables and map the columns to the requested format.
For example, this endpoint can be used to extract line items from an invoice document. The document may contain other tables like a packaging list. The service will first identify and consider only the relevant tables and then map the columns to fit the requested format.
Endpoint: /objects/:objectId/analyze/adapt_tables
Method: POST
Headers
Header | Value |
---|---|
Authorization | Bearer <YOUR_JWT_TOKEN> |
Path Parameters
Parameter | Description |
---|---|
objectId | (Required) The ID of the object |
Input Parameters
Parameter | Data Type | Description |
---|---|---|
item_name | string | (Required) The name of the item to extract. |
target_schema | string | (Required) The target schema of the tables. For example, a json schema |
instructions | string | (Required) The instructions for adapting the tables. Typically a description of what a row should contain |
environment | string | The environment to use for the workflow. |
notify_endpoints | string[] | The endpoints to notify when the workflow is complete. |
Example Request Payload
{
"item_name": "invoice line item",
"target_schema": "{\r\n \"$schema\": \"http:\/\/json-schema.org\/draft-07\/schema#\",\r\n \"type\": \"object\",\r\n \"title\": \"Invoice line item schema\",\r\n \"description\": \"A line item\",\r\n \"properties\": {\r\n \"line_item_number\": {\r\n \"type\": \"string\",\r\n \"description\": \"A simple identifier number for the line item which is unique and incremental\"\r\n },\r\n \"product_code\": {\r\n \"type\": \"string\"\r\n },\r\n \"description\": {\r\n \"type\": \"string\"\r\n },\r\n \"quantity\": {\r\n \"type\": \"number\",\r\n \"minimum\": 0\r\n },\r\n \"unit_price\": {\r\n \"type\": \"number\",\r\n \"minimum\": 0\r\n },\r\n \"amount\": {\r\n \"type\": \"number\",\r\n \"minimum\": 0\r\n }\r\n }\r\n}",
"instructions": "A valid invoice line item table features rows such as description, quantity, unit price, and amount columns.",
"environment": "environmentID",
"notify_endpoints": []
}
Example Response
{
"workflow_id": "123456:content_object.workflow_execution_request:abcdef1234567890abcdef123456",
"workflow_run_id": "abcdef12-abcd-1234-abcd-1234567890ab",
"status": "running"
}
Code Example
Adapt Tables
curl -X POST \
https://zeno-server-production.api.vertesia.io/api/v1/objects/:objectId/analyze/adapt_tables \
-H 'Authorization: Bearer <YOUR_JWT_TOKEN>' \
-H 'Content-Type: application/json' \
-d '{
"item_name": "invoice line item",
"target_schema": "{\r\n \"$schema\": \"http:\/\/json-schema.org\/draft-07\/schema#\",\r\n \"type\": \"object\",\r\n \"title\": \"Invoice line item schema\",\r\n \"description\": \"A line item\",\r\n \"properties\": {\r\n \"line_item_number\": {\r\n \"type\": \"string\",\r\n \"description\": \"A simple identifier number for the line item which is unique and incremental\"\r\n },\r\n \"product_code\": {\r\n \"type\": \"string\"\r\n },\r\n \"description\": {\r\n \"type\": \"string\"\r\n },\r\n \"quantity\": {\r\n \"type\": \"number\",\r\n \"minimum\": 0\r\n },\r\n \"unit_price\": {\r\n \"type\": \"number\",\r\n \"minimum\": 0\r\n },\r\n \"amount\": {\r\n \"type\": \"number\",\r\n \"minimum\": 0\r\n }\r\n }\r\n}",
"instructions": "A valid invoice line item table features rows such as description, quantity, unit price, and amount columns.",
"environment": "environmentID",
"notify_endpoints": []
}'
Get Adapted Tables
Use this endpoint to retreive the adapted tables when processing is complete.
Endpoint: /objects/:objectId/analyze/adapt_tables/:runId
Method: GET
Headers
Header | Value |
---|---|
Authorization | Bearer <YOUR_JWT_TOKEN> |
Path Parameters
Parameter | Description |
---|---|
objectId | (Required) The ID of the object to retrieve the adapted tables for. |
runId | (Required) The ID of the workflow run to retrieve the adapted tables for. |
Query Parameters
Parameter | Data Type | Description |
---|---|---|
format | 'csv' | 'json' | The format of the adapted tables to return. Defaults to json |
Example Request Payload
There is no JSON body with this request.
Example Response
description,quantity,price
Row 1, Cell 1,1,10
Row 2, Cell 1,2,20
Code Example
Get Adapted Tables
curl -X GET \
'https://zeno-server-production.api.vertesia.io/api/v1/objects/:objectId/analyze/adapt_tables/:runId?raw=false&format=csv' \
-H 'Authorization: Bearer <YOUR_JWT_TOKEN>'