Skills Reference

Skills are reusable capabilities that unlock specialized tools and provide domain-specific knowledge for AI agents. The Data Platform includes five skills for comprehensive data workflows.

Overview

SkillPurposeTools Unlocked
data_analysisQuery and analyze data with SQL/DuckDBdata_get_schema, data_list_tables, execute_shell
data_importImport data from files with atomic operationsdata_import, execute_shell
data_migrationSchema migrations with data transformationsdata_alter_table, data_create_tables, data_import, execute_shell
data_modelingDesign database schemasdata_create_database, data_create_tables, data_alter_table
data_visualizationCreate Vega-Lite dashboardsdata_preview_dashboard, data_create_dashboard, data_update_dashboard, data_render_dashboard

data_analysis

Query and analyze persistent data stores using SQL with DuckDB.

When to Use

  • Data is in a persistent data store (created via data_modeling + data_import)
  • SQL-based analytics, reusable queries, or data versioning needed
  • Complex analytics like window functions, CTEs, or cohort analysis

Recommended Approach

Use native DuckDB in the sandbox via execute_shell for best performance:

import duckdb

# Connect to synced database
con = duckdb.connect('/home/daytona/databases/my_store.duckdb')

# Run any SQL query
result = con.execute('''
    SELECT customer_id, SUM(amount) as total
    FROM orders
    GROUP BY customer_id
    ORDER BY total DESC
    LIMIT 10
''').fetchdf()

print(result)

Key Features

  • Window Functions: ROW_NUMBER(), LAG(), LEAD(), RANK()
  • CTEs: Break complex queries into readable parts
  • QUALIFY Clause: Filter window function results
  • Pivoting: Transform rows to columns
  • Multi-database queries: Join across multiple databases

Example: Cohort Analysis

WITH first_purchase AS (
  SELECT customer_id, MIN(order_date) as cohort_date
  FROM orders GROUP BY customer_id
)
SELECT
  DATE_TRUNC('month', fp.cohort_date) as cohort,
  DATE_DIFF('month', fp.cohort_date, o.order_date) as months_since,
  COUNT(DISTINCT o.customer_id) as customers
FROM orders o
JOIN first_purchase fp ON o.customer_id = fp.customer_id
GROUP BY 1, 2

data_import

Import data from CSV, JSON, Parquet files or inline data with atomic multi-table support.

When to Use

  • Loading data from files into data stores
  • Batch data updates or refreshes
  • ETL workflows with data transformation

Supported Formats

  • CSV: Comma-separated values with header row
  • JSON: Array of objects or newline-delimited JSON
  • Parquet: Columnar format for large datasets

Import Modes

  • append: Add new rows to existing data (default)
  • replace: Clear table before importing

Data Sources

SourceURI FormatExample
inline(data in data field)Direct JSON array
gcsgs://bucket/pathgs://my-bucket/data.csv
urlhttps://...https://example.com/data.csv
artifactout/filenameout/cleaned_data.csv

Column Normalization

When importing, column names are automatically normalized:

  • Item Typeitem_type
  • Order-IDorder_id
  • 123_count_123_count

Create table schemas using normalized names.

Example: Multi-Table Import

{
  "mode": "replace",
  "message": "Monthly data refresh",
  "tables": {
    "customers": {
      "source": "url",
      "uri": "https://example.com/customers.csv",
      "format": "csv"
    },
    "orders": {
      "source": "gcs",
      "uri": "gs://bucket/orders.parquet",
      "format": "parquet"
    }
  }
}

data_migration

Perform schema migrations with data transformations.

When to Use Migration vs Alter

ChangeUse
Add nullable columndata_alter_table
Drop columndata_alter_table
Rename columndata_alter_table
Change column typeMigration
Split column into twoMigration
Merge columnsMigration
Restructure tableMigration

Migration Workflow

  1. Create snapshot before migration
  2. Export data from current table
  3. Transform with Python/pandas
  4. Update schema with new structure
  5. Import transformed data

Common Patterns

Change Column Type:

df['amount'] = pd.to_numeric(df['amount'], errors='coerce')

Split Column:

df[['first_name', 'last_name']] = df['full_name'].str.split(' ', n=1, expand=True)

Merge Columns:

df['full_address'] = df['street'] + ', ' + df['city'] + ', ' + df['state']

Safety Best Practices

  1. Always create a named snapshot before migration
  2. Test on sample data first
  3. Validate row counts before and after
  4. Know how to rollback from snapshot

data_modeling

Create databases and design data store schemas with tables, columns, and relationships.

When to Use

  • Creating new data stores
  • Designing table schemas
  • Defining relationships between tables

Schema Design Best Practices

Table Naming:

  • Use snake_case: customer_orders
  • Use plural names: customers, products
  • Descriptive junction tables: customer_product_access

Column Types:

  • STRING - Text data
  • INTEGER / BIGINT - Whole numbers
  • DECIMAL - Precise decimals (financial data)
  • BOOLEAN - True/false
  • DATE / TIMESTAMP - Temporal values
  • JSON - Structured data

Semantic Types: Add hints for AI understanding:

  • email, phone, url - Contact info
  • currency, percentage - Financial values
  • person_name, address - Personal info
  • identifier - IDs and codes

Example: E-Commerce Schema

{
  "tables": [
    {
      "name": "customers",
      "columns": [
        {"name": "id", "type": "INTEGER", "primary_key": true},
        {"name": "email", "type": "STRING", "semantic_type": "email"},
        {"name": "name", "type": "STRING", "semantic_type": "person_name"}
      ]
    },
    {
      "name": "orders",
      "columns": [
        {"name": "id", "type": "INTEGER", "primary_key": true},
        {"name": "customer_id", "type": "INTEGER"},
        {"name": "amount", "type": "DECIMAL", "semantic_type": "currency"}
      ],
      "foreign_keys": [
        {"column": "customer_id", "references_table": "customers", "references_column": "id"}
      ]
    }
  ]
}

data_visualization

Create persistent Vega-Lite dashboards with SQL-backed data.

When to Use

  • Data is in a persistent data store
  • Saved, reusable dashboards needed
  • Cross-panel interactivity (selections filter other panels)

Workflow

  1. Test query - Verify SQL returns expected data
  2. Preview - Use data_preview_dashboard to iterate
  3. Verify - Check labels, colors, layout
  4. Create - Save with data_create_dashboard
  5. Render - Generate PNG with data_render_dashboard

Dashboard Structure

A dashboard consists of:

  • query: Single SQL query returning all data
  • spec: Vega-Lite specification (no $schema or data - injected automatically)

Layout Options

LayoutDescription
vconcatVertical stack
hconcatHorizontal layout
concat + columnsGrid layout

Cross-Panel Interactivity

Selections in one panel can filter another:

{
  "vconcat": [
    {
      "params": [{"name": "regionSelect", "select": {"type": "point", "fields": ["region"]}}],
      "mark": "bar",
      "encoding": {
        "x": {"field": "region"},
        "y": {"aggregate": "sum", "field": "amount"}
      }
    },
    {
      "transform": [{"filter": {"param": "regionSelect"}}],
      "mark": "line",
      "encoding": {
        "x": {"field": "date", "type": "temporal"},
        "y": {"aggregate": "sum", "field": "amount"}
      }
    }
  ]
}

Query Parameters

Make dashboards dynamic with parameterized SQL:

{
  "query": "SELECT * FROM sales WHERE date >= {{start_date}}",
  "queryParameters": {
    "start_date": "CURRENT_DATE - INTERVAL 30 DAY"
  }
}

Best Practices

  1. Always add tooltips - Show values on hover
  2. Single query - Use JOINs/CTEs for all needed data
  3. Preview first - Verify before saving
  4. Use muted colors - Avoid harsh saturated colors
  5. Clear titles - Each panel should explain itself

Skill Combinations

Skills work together for complete workflows:

Data Pipeline:

  1. data_modeling - Create schema
  2. data_import - Load initial data
  3. data_analysis - Query and explore
  4. data_visualization - Create dashboards

Schema Evolution:

  1. data_modeling - Initial schema
  2. data_analysis - Identify needed changes
  3. data_migration - Transform schema and data

Reporting Workflow:

  1. data_analysis - Develop queries
  2. data_visualization - Create dashboard
  3. data_analysis - Refine based on feedback

Was this page helpful?