Semantic Layer

The /objects/:objectId/analyze endpoint exposes a service to transform content into XML files. XML provides a richer, more structured representation than other plain text formats. This enhanced structure facilitates better processing and understanding of the document's by LLM models, particularly for large tables and complex layouts. As of Today, only PDF files are supported.

Besides improving the "understanding" of complex documents by LLM, the XML formats also enables additional capabilities:

  • Deep linking. LLM responses can now include not just the exact page of a reference, but also its position. This capability can be leveraged to build awesome user experiences.
  • Extracting pieces of content without LLM rewriting nor hallucination such as long tables

Analyze

Use this endpoint to trigger a content analysis for an Object

Endpoint: /objects/:objectId/analyze

Method: POST

Headers

HeaderValue
AuthorizationBearer <YOUR_JWT_TOKEN>

Path Parameters

ParameterDescription
objectId(Required) The ID of the Object to analyze. The object mime-type must be application/pdf

Input Parameters

ParameterData TypeDescription
featuresstring[]The features to activate for the analyzis. Currently not used.

Example Request Payload

{
    "features": [],
}

Example Response

{
    "workflow_id": "workflow_execution_request:abcdef1234567890abcdef123456",
    "workflow_run_id": "abcdef12-abcd-1234-abcd-1234567890ab",
    "status": 1
}

Code Example

Analyze

curl -X POST \
  https://zeno-server-production.api.vertesia.io/api/v1/objects/:objectId/analyze \
  -H 'Authorization: Bearer <YOUR_JWT_TOKEN>' \
  -H 'Content-Type: application/json' \
  -d '{
    "features": [],
}'

Get Status

Use this endpoint to get the status of a previously requested analysis

Endpoint: /objects/:objectId/analyze/status

Method: GET

Headers

HeaderValue
AuthorizationBearer <YOUR_JWT_TOKEN>

Path Parameters

ParameterDescription
objectId(Required) The ID of the object to retrieve the status for.

Example Request Payload

There is no JSON body with this request.

Example Response

{
  "workflow_id": "652d77:workflow_execution_request:67ef301df9967ef50104d652",
  "workflow_run_id": "0195fe53-f5db-7420-b161-9882945d2339",
  "status": 1,
  "progress": {
    "pages": {
      "total": 1,
      "processed": 1,
      "success": 1,
      "failed": 0,
    },
    "tables": {
      "total": 1,
      "processed": 1,
      "success": 1,
      "failed": 0,
    },
    "images": {
      "total": 1,
      "processed": 0,
      "success": 0,
      "failed": 0,
    },
    "visuals": {
      "total": 2,
      "processed": 0,
      "success": 0,
      "failed": 0,
    },
    "started_at": 1743728670181,
    "percent": 40,
  },
}

Code Example

Get Status

curl -X GET \
  https://zeno-server-production.api.vertesia.io/api/v1/objects/:objectId/analyze/status \
  -H 'Authorization: Bearer <YOUR_JWT_TOKEN>'

Get Results

Use this endpoint to retreive the result of an analysis once it is completed. The response will contain the XML conversion of the object.

Endpoint: /objects/:objectId/analyze/results

Method: GET

Headers

HeaderValue
AuthorizationBearer <YOUR_JWT_TOKEN>

Path Parameters

ParameterDescription
objectId(Required) The ID of the object to retrieve the results for.

Example Request Payload

There is no JSON body with this request.

Example Response

{
  "document": "<xml>...</xml>",
  "tables": [],
  "images": [],
  "annotated": "https://..." 
}

Code Example

Get Results

curl -X GET \
  https://zeno-server-production.api.vertesia.io/api/v1/objects/:objectId/analyze/results \
  -H 'Authorization: Bearer <YOUR_JWT_TOKEN>'

Get XML Text

Use this endpoint to fetch the object's corresponding XML string once the analysis is completed.

Endpoint: /objects/:objectId/analyze/xml

Method: GET

Headers

HeaderValue
AuthorizationBearer <YOUR_JWT_TOKEN>

Path Parameters

ParameterDescription
objectId(Required) The ID of the object to retrieve the xml text for.

Example Request Payload

There is no JSON body with this request.

Example Response

<document>...</document>

Code Example

Get XML Text

curl -X GET \
  https://zeno-server-production.api.vertesia.io/api/v1/objects/:objectId/analyze/xml \
  -H 'Authorization: Bearer <YOUR_JWT_TOKEN>'

Get Tables

Use this endpoint to fetch the object's table content once the analysis is completed.

Endpoint: /objects/:objectId/analyze/tables

Method: GET

Headers

HeaderValue
AuthorizationBearer <YOUR_JWT_TOKEN>

Path Parameters

ParameterDescription
objectId(Required) The ID of the object to retrieve the tables for.

Query Parameters

ParameterData TypeDescription
format'csv' | 'json'The format of the adapted tables to return. Defaults to json

Example Request Payload

There is no JSON body with this request.

Example Response

[
  {
    "page_number": 1,
    "table_number": 10,
    "data": [
      {
        "Column 1": "Row 1, Cell 1 content",
        "Column 2": "Row 1, Cell 2 content"
      },
      {
        "Column 1": "Row 2, Cell 1 content",
        "Column 2": "Row 2, Cell 2 content"
      }
    ],
    "format": "application/json"
  }
]

Code Example

Get Tables

curl -X GET \
  https://zeno-server-production.api.vertesia.io/api/v1/objects/:objectId/analyze/tables?format=json \
  -H 'Authorization: Bearer <YOUR_JWT_TOKEN>'

Get Images

Use this endpoint to retrieve information about the images that are embedded into the source PDF file.

Endpoint: /objects/:objectId/analyze/images

Method: GET

Headers

HeaderValue
AuthorizationBearer <YOUR_JWT_TOKEN>

Path Parameters

ParameterDescription
objectId(Required) The ID of the object to retrieve the images for.

Example Request Payload

There is no JSON body with this request.

Example Response

[
  {
    "id":"55",
    "page_number": 1,
    "description": "This is an image.",
    "width": 100,
    "height": 100,
    "is_meaningful": false
  }
]

Code Example

Get Images

curl -X GET \
  https://zeno-server-production.api.vertesia.io/api/v1/objects/:objectId/analyze/images \
  -H 'Authorization: Bearer <YOUR_JWT_TOKEN>'

Get Annotated

Use this endpoint to get a rendition of the PDF file annotated with the blocks outlines and Ids

Endpoint: /objects/:objectId/analyze/annotated

Method: GET

Headers

HeaderValue
AuthorizationBearer <YOUR_JWT_TOKEN>

Path Parameters

ParameterDescription
objectId(Required) The ID of the object to retrieve the annotated PDF for.

Example Request Payload

There is no JSON body with this request.

Example Response

{
    "url": "https://storage.googleapis.com/.../annotated.pdf"
}

Code Example

Get Annotated

curl -X GET \
  https://zeno-server-production.api.vertesia.io/api/v1/objects/:objectId/analyze/annotated \
  -H 'Authorization: Bearer <YOUR_JWT_TOKEN>'

Adapt Tables

Use this endpoint to transform tables contained in source the pdf files to a format of your choice. The service will identify the relevant tables and map the columns to the requested format.

For example, this endpoint can be used to extract line items from an invoice document. The document may contain other tables like a packaging list. The service will first identify and consider only the relevant tables and then map the columns to fit the requested format.

Endpoint: /objects/:objectId/analyze/adapt_tables

Method: POST

Headers

HeaderValue
AuthorizationBearer <YOUR_JWT_TOKEN>

Path Parameters

ParameterDescription
objectId(Required) The ID of the object

Input Parameters

ParameterData TypeDescription
item_namestring(Required) The name of the item to extract.
target_schemastring(Required) The target schema of the tables. For example, a json schema
instructionsstring(Required) The instructions for adapting the tables. Typically a description of what a row should contain
environmentstringThe environment to use for the workflow.
notify_endpointsstring[]The endpoints to notify when the workflow is complete.

Example Request Payload

{
    "item_name": "invoice line item",
    "target_schema": "{\r\n  \"$schema\": \"http:\/\/json-schema.org\/draft-07\/schema#\",\r\n  \"type\": \"object\",\r\n  \"title\": \"Invoice line item schema\",\r\n  \"description\": \"A line item\",\r\n  \"properties\": {\r\n    \"line_item_number\": {\r\n      \"type\": \"string\",\r\n      \"description\": \"A simple identifier number for the line item which is unique and incremental\"\r\n    },\r\n    \"product_code\": {\r\n      \"type\": \"string\"\r\n    },\r\n    \"description\": {\r\n      \"type\": \"string\"\r\n    },\r\n    \"quantity\": {\r\n      \"type\": \"number\",\r\n      \"minimum\": 0\r\n    },\r\n    \"unit_price\": {\r\n      \"type\": \"number\",\r\n      \"minimum\": 0\r\n    },\r\n    \"amount\": {\r\n      \"type\": \"number\",\r\n      \"minimum\": 0\r\n    }\r\n  }\r\n}",
    "instructions": "A valid invoice line item table features rows such as description, quantity, unit price, and amount columns.",
    "environment": "environmentID",
    "notify_endpoints": []
}

Example Response

{
    "workflow_id": "123456:content_object.workflow_execution_request:abcdef1234567890abcdef123456",
    "workflow_run_id": "abcdef12-abcd-1234-abcd-1234567890ab",
    "status": "running"
}

Code Example

Adapt Tables

curl -X POST \
  https://zeno-server-production.api.vertesia.io/api/v1/objects/:objectId/analyze/adapt_tables \
  -H 'Authorization: Bearer <YOUR_JWT_TOKEN>' \
  -H 'Content-Type: application/json' \
  -d '{
    "item_name": "invoice line item",
    "target_schema": "{\r\n  \"$schema\": \"http:\/\/json-schema.org\/draft-07\/schema#\",\r\n  \"type\": \"object\",\r\n  \"title\": \"Invoice line item schema\",\r\n  \"description\": \"A line item\",\r\n  \"properties\": {\r\n    \"line_item_number\": {\r\n      \"type\": \"string\",\r\n      \"description\": \"A simple identifier number for the line item which is unique and incremental\"\r\n    },\r\n    \"product_code\": {\r\n      \"type\": \"string\"\r\n    },\r\n    \"description\": {\r\n      \"type\": \"string\"\r\n    },\r\n    \"quantity\": {\r\n      \"type\": \"number\",\r\n      \"minimum\": 0\r\n    },\r\n    \"unit_price\": {\r\n      \"type\": \"number\",\r\n      \"minimum\": 0\r\n    },\r\n    \"amount\": {\r\n      \"type\": \"number\",\r\n      \"minimum\": 0\r\n    }\r\n  }\r\n}",
    "instructions": "A valid invoice line item table features rows such as description, quantity, unit price, and amount columns.",
    "environment": "environmentID",
    "notify_endpoints": []
}'

Get Adapted Tables

Use this endpoint to retreive the adapted tables when processing is complete.

Endpoint: /objects/:objectId/analyze/adapt_tables/:runId

Method: GET

Headers

HeaderValue
AuthorizationBearer <YOUR_JWT_TOKEN>

Path Parameters

ParameterDescription
objectId(Required) The ID of the object to retrieve the adapted tables for.
runId(Required) The ID of the workflow run to retrieve the adapted tables for.

Query Parameters

ParameterData TypeDescription
format'csv' | 'json'The format of the adapted tables to return. Defaults to json

Example Request Payload

There is no JSON body with this request.

Example Response

description,quantity,price
Row 1, Cell 1,1,10
Row 2, Cell 1,2,20

Code Example

Get Adapted Tables

curl -X GET \
  'https://zeno-server-production.api.vertesia.io/api/v1/objects/:objectId/analyze/adapt_tables/:runId?raw=false&format=csv' \
  -H 'Authorization: Bearer <YOUR_JWT_TOKEN>'

Was this page helpful?