Skills Reference
Skills are reusable capabilities that unlock specialized tools and provide domain-specific knowledge for AI agents. The Data Platform includes five skills for comprehensive data workflows.
Overview
| Skill | Purpose | Tools Unlocked |
|---|---|---|
data_analysis | Query and analyze data with SQL/DuckDB | data_get_schema, data_list_tables, execute_shell |
data_import | Import data from files with atomic operations | data_import, execute_shell |
data_migration | Schema migrations with data transformations | data_alter_table, data_create_tables, data_import, execute_shell |
data_modeling | Design database schemas | data_create_database, data_create_tables, data_alter_table |
data_visualization | Create Vega-Lite dashboards | data_preview_dashboard, data_create_dashboard, data_update_dashboard, data_render_dashboard |
data_analysis
Query and analyze persistent data stores using SQL with DuckDB.
When to Use
- Data is in a persistent data store (created via
data_modeling+data_import) - SQL-based analytics, reusable queries, or data versioning needed
- Complex analytics like window functions, CTEs, or cohort analysis
Recommended Approach
Use native DuckDB in the sandbox via execute_shell for best performance:
import duckdb
# Connect to synced database
con = duckdb.connect('/home/daytona/databases/my_store.duckdb')
# Run any SQL query
result = con.execute('''
SELECT customer_id, SUM(amount) as total
FROM orders
GROUP BY customer_id
ORDER BY total DESC
LIMIT 10
''').fetchdf()
print(result)
Key Features
- Window Functions:
ROW_NUMBER(),LAG(),LEAD(),RANK() - CTEs: Break complex queries into readable parts
- QUALIFY Clause: Filter window function results
- Pivoting: Transform rows to columns
- Multi-database queries: Join across multiple databases
Example: Cohort Analysis
WITH first_purchase AS (
SELECT customer_id, MIN(order_date) as cohort_date
FROM orders GROUP BY customer_id
)
SELECT
DATE_TRUNC('month', fp.cohort_date) as cohort,
DATE_DIFF('month', fp.cohort_date, o.order_date) as months_since,
COUNT(DISTINCT o.customer_id) as customers
FROM orders o
JOIN first_purchase fp ON o.customer_id = fp.customer_id
GROUP BY 1, 2
data_import
Import data from CSV, JSON, Parquet files or inline data with atomic multi-table support.
When to Use
- Loading data from files into data stores
- Batch data updates or refreshes
- ETL workflows with data transformation
Supported Formats
- CSV: Comma-separated values with header row
- JSON: Array of objects or newline-delimited JSON
- Parquet: Columnar format for large datasets
Import Modes
- append: Add new rows to existing data (default)
- replace: Clear table before importing
Data Sources
| Source | URI Format | Example |
|---|---|---|
| inline | (data in data field) | Direct JSON array |
| gcs | gs://bucket/path | gs://my-bucket/data.csv |
| url | https://... | https://example.com/data.csv |
| artifact | out/filename | out/cleaned_data.csv |
Column Normalization
When importing, column names are automatically normalized:
Item Type→item_typeOrder-ID→order_id123_count→_123_count
Create table schemas using normalized names.
Example: Multi-Table Import
{
"mode": "replace",
"message": "Monthly data refresh",
"tables": {
"customers": {
"source": "url",
"uri": "https://example.com/customers.csv",
"format": "csv"
},
"orders": {
"source": "gcs",
"uri": "gs://bucket/orders.parquet",
"format": "parquet"
}
}
}
data_migration
Perform schema migrations with data transformations.
When to Use Migration vs Alter
| Change | Use |
|---|---|
| Add nullable column | data_alter_table |
| Drop column | data_alter_table |
| Rename column | data_alter_table |
| Change column type | Migration |
| Split column into two | Migration |
| Merge columns | Migration |
| Restructure table | Migration |
Migration Workflow
- Create snapshot before migration
- Export data from current table
- Transform with Python/pandas
- Update schema with new structure
- Import transformed data
Common Patterns
Change Column Type:
df['amount'] = pd.to_numeric(df['amount'], errors='coerce')
Split Column:
df[['first_name', 'last_name']] = df['full_name'].str.split(' ', n=1, expand=True)
Merge Columns:
df['full_address'] = df['street'] + ', ' + df['city'] + ', ' + df['state']
Safety Best Practices
- Always create a named snapshot before migration
- Test on sample data first
- Validate row counts before and after
- Know how to rollback from snapshot
data_modeling
Create databases and design data store schemas with tables, columns, and relationships.
When to Use
- Creating new data stores
- Designing table schemas
- Defining relationships between tables
Schema Design Best Practices
Table Naming:
- Use snake_case:
customer_orders - Use plural names:
customers,products - Descriptive junction tables:
customer_product_access
Column Types:
STRING- Text dataINTEGER/BIGINT- Whole numbersDECIMAL- Precise decimals (financial data)BOOLEAN- True/falseDATE/TIMESTAMP- Temporal valuesJSON- Structured data
Semantic Types: Add hints for AI understanding:
email,phone,url- Contact infocurrency,percentage- Financial valuesperson_name,address- Personal infoidentifier- IDs and codes
Example: E-Commerce Schema
{
"tables": [
{
"name": "customers",
"columns": [
{"name": "id", "type": "INTEGER", "primary_key": true},
{"name": "email", "type": "STRING", "semantic_type": "email"},
{"name": "name", "type": "STRING", "semantic_type": "person_name"}
]
},
{
"name": "orders",
"columns": [
{"name": "id", "type": "INTEGER", "primary_key": true},
{"name": "customer_id", "type": "INTEGER"},
{"name": "amount", "type": "DECIMAL", "semantic_type": "currency"}
],
"foreign_keys": [
{"column": "customer_id", "references_table": "customers", "references_column": "id"}
]
}
]
}
data_visualization
Create persistent Vega-Lite dashboards with SQL-backed data.
When to Use
- Data is in a persistent data store
- Saved, reusable dashboards needed
- Cross-panel interactivity (selections filter other panels)
Workflow
- Test query - Verify SQL returns expected data
- Preview - Use
data_preview_dashboardto iterate - Verify - Check labels, colors, layout
- Create - Save with
data_create_dashboard - Render - Generate PNG with
data_render_dashboard
Dashboard Structure
A dashboard consists of:
- query: Single SQL query returning all data
- spec: Vega-Lite specification (no
$schemaordata- injected automatically)
Layout Options
| Layout | Description |
|---|---|
vconcat | Vertical stack |
hconcat | Horizontal layout |
concat + columns | Grid layout |
Cross-Panel Interactivity
Selections in one panel can filter another:
{
"vconcat": [
{
"params": [{"name": "regionSelect", "select": {"type": "point", "fields": ["region"]}}],
"mark": "bar",
"encoding": {
"x": {"field": "region"},
"y": {"aggregate": "sum", "field": "amount"}
}
},
{
"transform": [{"filter": {"param": "regionSelect"}}],
"mark": "line",
"encoding": {
"x": {"field": "date", "type": "temporal"},
"y": {"aggregate": "sum", "field": "amount"}
}
}
]
}
Query Parameters
Make dashboards dynamic with parameterized SQL:
{
"query": "SELECT * FROM sales WHERE date >= {{start_date}}",
"queryParameters": {
"start_date": "CURRENT_DATE - INTERVAL 30 DAY"
}
}
Best Practices
- Always add tooltips - Show values on hover
- Single query - Use JOINs/CTEs for all needed data
- Preview first - Verify before saving
- Use muted colors - Avoid harsh saturated colors
- Clear titles - Each panel should explain itself
Skill Combinations
Skills work together for complete workflows:
Data Pipeline:
data_modeling- Create schemadata_import- Load initial datadata_analysis- Query and exploredata_visualization- Create dashboards
Schema Evolution:
data_modeling- Initial schemadata_analysis- Identify needed changesdata_migration- Transform schema and data
Reporting Workflow:
data_analysis- Develop queriesdata_visualization- Create dashboarddata_analysis- Refine based on feedback
Related Documentation
- Tools Reference - Detailed tool parameters
- Dashboards Guide - Vega-Lite deep dive
- API Reference - REST API documentation
