Building a Data Analyst AI

My day job requires a lot of data engineering, like a lot. Sometimes I need to write quick nested SQL, wrangle some CSVs, maybe parse JSON. ChatGPT is great but it can't run and test the code, it can't do multi-turn conversations on my own data model.

I use DuckDb for ad-hoc tasks, its fast, can load any kind of data and works out of the box.

Normally it would take me anywhere between an hour to 3 for such tasks, so I've automated with with 15 lines of code. Here's an example:

import json
from phi.assistant.duckdb import DuckDbAssistant

duckdb_assistant = DuckDbAssistant(
    semantic_model=json.dumps({
        "tables": [
            {
                "name": "movies",
                "description": "Contains information about movies from IMDB.",
                "path": "https://phidata-public.s3.amazonaws.com/demo_data/IMDB-Movie-Data.csv",
            }
        ]
    }),
)

duckdb_assistant.print_response("What is the average rating of movies? Show me the SQL.")
duckdb_assistant.print_response("What is the revenue per year?")
Read more about the DuckDbAssistant

Here's it in action:

Stay up to date

Get notified when I publish something new