Skills Reference

Skills are reusable capabilities that unlock specialized tools and provide domain-specific knowledge for AI agents. The Data Platform includes five skills for comprehensive data workflows.

Overview

Skill	Purpose	Tools Unlocked
`data_analysis`	Query and analyze data with SQL/DuckDB	`data_get_schema`, `data_list_tables`, `execute_shell`
`data_import`	Import data from files with atomic operations	`data_import`, `execute_shell`
`data_migration`	Schema migrations with data transformations	`data_alter_table`, `data_create_tables`, `data_import`, `execute_shell`
`data_modeling`	Design database schemas	`data_create_database`, `data_create_tables`, `data_alter_table`
`data_visualization`	Create Vega-Lite dashboards	`data_preview_dashboard`, `data_create_dashboard`, `data_update_dashboard`, `data_render_dashboard`

data_analysis

Query and analyze persistent data stores using SQL with DuckDB.

When to Use

Data is in a persistent data store (created via data_modeling + data_import)
SQL-based analytics, reusable queries, or data versioning needed
Complex analytics like window functions, CTEs, or cohort analysis

Recommended Approach

Use native DuckDB in the sandbox via execute_shell for best performance:

import duckdb

# Connect to synced database
con = duckdb.connect('/home/daytona/databases/my_store.duckdb')

# Run any SQL query
result = con.execute('''
    SELECT customer_id, SUM(amount) as total
    FROM orders
    GROUP BY customer_id
    ORDER BY total DESC
    LIMIT 10
''').fetchdf()

print(result)

Key Features

Window Functions: ROW_NUMBER(), LAG(), LEAD(), RANK()
CTEs: Break complex queries into readable parts
QUALIFY Clause: Filter window function results
Pivoting: Transform rows to columns
Multi-database queries: Join across multiple databases

Example: Cohort Analysis

WITH first_purchase AS (
  SELECT customer_id, MIN(order_date) as cohort_date
  FROM orders GROUP BY customer_id
)
SELECT
  DATE_TRUNC('month', fp.cohort_date) as cohort,
  DATE_DIFF('month', fp.cohort_date, o.order_date) as months_since,
  COUNT(DISTINCT o.customer_id) as customers
FROM orders o
JOIN first_purchase fp ON o.customer_id = fp.customer_id
GROUP BY 1, 2

data_import

Import data from CSV, JSON, Parquet files or inline data with atomic multi-table support.

When to Use

Loading data from files into data stores
Batch data updates or refreshes
ETL workflows with data transformation

Supported Formats

CSV: Comma-separated values with header row
JSON: Array of objects or newline-delimited JSON
Parquet: Columnar format for large datasets

Import Modes

append: Add new rows to existing data (default)
replace: Clear table before importing

Data Sources

Source	URI Format	Example
inline	(data in `data` field)	Direct JSON array
gcs	`gs://bucket/path`	`gs://my-bucket/data.csv`
url	`https://...`	`https://example.com/data.csv`
artifact	`out/filename`	`out/cleaned_data.csv`

Column Normalization

When importing, column names are automatically normalized:

Item Type → item_type
Order-ID → order_id
123_count → _123_count

Create table schemas using normalized names.

Example: Multi-Table Import

{
  "mode": "replace",
  "message": "Monthly data refresh",
  "tables": {
    "customers": {
      "source": "url",
      "uri": "https://example.com/customers.csv",
      "format": "csv"
    },
    "orders": {
      "source": "gcs",
      "uri": "gs://bucket/orders.parquet",
      "format": "parquet"
    }
  }
}

data_migration

Perform schema migrations with data transformations.

When to Use Migration vs Alter

Change	Use
Add nullable column	`data_alter_table`
Drop column	`data_alter_table`
Rename column	`data_alter_table`
Change column type	Migration
Split column into two	Migration
Merge columns	Migration
Restructure table	Migration

Migration Workflow

Create snapshot before migration
Export data from current table
Transform with Python/pandas
Update schema with new structure
Import transformed data

Common Patterns

Change Column Type:

df['amount'] = pd.to_numeric(df['amount'], errors='coerce')

Split Column:

df[['first_name', 'last_name']] = df['full_name'].str.split(' ', n=1, expand=True)

Merge Columns:

df['full_address'] = df['street'] + ', ' + df['city'] + ', ' + df['state']

Safety Best Practices

Always create a named snapshot before migration
Test on sample data first
Validate row counts before and after
Know how to rollback from snapshot

data_modeling

Create databases and design data store schemas with tables, columns, and relationships.

When to Use

Creating new data stores
Designing table schemas
Defining relationships between tables

Schema Design Best Practices

Table Naming:

Use snake_case: customer_orders
Use plural names: customers, products
Descriptive junction tables: customer_product_access

Column Types:

STRING - Text data
INTEGER / BIGINT - Whole numbers
DECIMAL - Precise decimals (financial data)
BOOLEAN - True/false
DATE / TIMESTAMP - Temporal values
JSON - Structured data

Semantic Types: Add hints for AI understanding:

email, phone, url - Contact info
currency, percentage - Financial values
person_name, address - Personal info
identifier - IDs and codes

Example: E-Commerce Schema

{
  "tables": [
    {
      "name": "customers",
      "columns": [
        {"name": "id", "type": "INTEGER", "primary_key": true},
        {"name": "email", "type": "STRING", "semantic_type": "email"},
        {"name": "name", "type": "STRING", "semantic_type": "person_name"}
      ]
    },
    {
      "name": "orders",
      "columns": [
        {"name": "id", "type": "INTEGER", "primary_key": true},
        {"name": "customer_id", "type": "INTEGER"},
        {"name": "amount", "type": "DECIMAL", "semantic_type": "currency"}
      ],
      "foreign_keys": [
        {"column": "customer_id", "references_table": "customers", "references_column": "id"}
      ]
    }
  ]
}

data_visualization

Create persistent Vega-Lite dashboards with SQL-backed data.

When to Use

Data is in a persistent data store
Saved, reusable dashboards needed
Cross-panel interactivity (selections filter other panels)

Workflow

Test query - Verify SQL returns expected data
Preview - Use data_preview_dashboard to iterate
Verify - Check labels, colors, layout
Create - Save with data_create_dashboard
Render - Generate PNG with data_render_dashboard

Dashboard Structure

A dashboard consists of:

query: Single SQL query returning all data
spec: Vega-Lite specification (no $schema or data - injected automatically)

Layout Options

Layout	Description
`vconcat`	Vertical stack
`hconcat`	Horizontal layout
`concat` + `columns`	Grid layout

Cross-Panel Interactivity

Selections in one panel can filter another:

{
  "vconcat": [
    {
      "params": [{"name": "regionSelect", "select": {"type": "point", "fields": ["region"]}}],
      "mark": "bar",
      "encoding": {
        "x": {"field": "region"},
        "y": {"aggregate": "sum", "field": "amount"}
      }
    },
    {
      "transform": [{"filter": {"param": "regionSelect"}}],
      "mark": "line",
      "encoding": {
        "x": {"field": "date", "type": "temporal"},
        "y": {"aggregate": "sum", "field": "amount"}
      }
    }
  ]
}

Query Parameters

Make dashboards dynamic with parameterized SQL:

{
  "query": "SELECT * FROM sales WHERE date >= {{start_date}}",
  "queryParameters": {
    "start_date": "CURRENT_DATE - INTERVAL 30 DAY"
  }
}

Best Practices

Always add tooltips - Show values on hover
Single query - Use JOINs/CTEs for all needed data
Preview first - Verify before saving
Use muted colors - Avoid harsh saturated colors
Clear titles - Each panel should explain itself

Skill Combinations

Skills work together for complete workflows:

Data Pipeline:

data_modeling - Create schema
data_import - Load initial data
data_analysis - Query and explore
data_visualization - Create dashboards

Schema Evolution:

data_modeling - Initial schema
data_analysis - Identify needed changes
data_migration - Transform schema and data

Reporting Workflow:

data_analysis - Develop queries
data_visualization - Create dashboard
data_analysis - Refine based on feedback

Tools Reference - Detailed tool parameters
Dashboards Guide - Vega-Lite deep dive
API Reference - REST API documentation