Announcing Synthetic Data Generation in Mosaic AI Agent Evaluation

Databricks · Intermediate ·🌐 Frontend Engineering ·1y ago

Skills: Tool Use & Function Calling90%Agent Foundations80%

Key Takeaways

The video showcases the Synthetic Data Generation API in Mosaic AI Agent Evaluation, demonstrating how to create evaluation datasets quickly and improve AI agent performance without relying on Subject Matter Experts.

Full Transcript

hi folks welcome back to data Brooks I'm Eric Peter the PM for agent evaluation and framework we are excited to announce agent evaluations new synthetic data capabilities available today we've designed this capability to help developers improve the quality of their AI agents specifically this product addresses a key challenge that we've heard from many customers it's a difficult and timeconsuming process to have subject matter experts label high quality evaluation data however without this data it's difficult to evaluate and improve agent quality to reach production quality targets our synthetic data API allows developers to overcome these challenges by generating a highquality evaluation data set in minutes customers who tested this capability were able to achieve 60% improvements in quality using this data before they engaged with their subject matter experts let's do a quick demo to show how the product works first I'll install data bre agents and ml flow next I'll load my parse document Corpus from a Delta table in this case I'll use the datab BR documentation next I'll use the API to generate synthetic evaluation data let me quickly walk through how the API Works we've designed the API to be simple and easy to use you pass your documents how many evaluation questions you want and optionally you can pass a description of your agent and question guidelines to help tune the questions that that pipeline generates in this case we'll tell it that it's a chat that anwers questions about data brecks and the user personas and a few example questions let me take this off now that it's finished you can see it's taken about 2 minutes to generate 100 questions let's go ahead and take a look at the data you can think of each row as a single fully formed test case that allows you to assess the quality of your agent in this case we generated a question following the guidelines that you provided and we identified a set of facts that must be present in the agent's response in order to accurately answer this question these expected facts make it easier for subject matter experts to review this data down the line and it improves the accuracy of the llm judges that assess the quality of your agent's response you could think of this as kind of having a digital subject matter expert at your side who can quickly give you high quality agent questions and the criteria to judge the agent's responses now that we have the data let's go ahead and use it to evaluate and improve our agents quality next I'll write my agents code here I've written a function calling agent that uses a vector search retriever tool in this case I've used open ISD but I could have used any of the popular agent authoring Frameworks that data rck supports like Lane graph llama index autogen and more let's run the agent so we can quickly understand what's happening behind the scenes here we can see the ml flow Trace which provides observability in development and production we can see the vector search retriever tool was called as well as several calls to an all them now let's evaluate the agent first I'll log the agent's code and config the ml flow so I have a copy and I know exactly what agent has been evaluated next I'll call ml flow. evaluate to run the evaluation I'll in the data that I generated above the model that I just logged and I'll activate databit Mosaic AI agent evaluation proprietary LM judges let's kick this off now that evaluation is finished let me open the ml flow UI to look at the results mlflow and agent evaluation assess the quality of each record in the evaluation data set and provide you an overall score of quality in this case we can see that 54 are passing and 46 are not high quality agent evaluation identifies the root cause of the quality issues in this case we could see that 42% of the incorrect responses are due to an issue with retrieval let's open up a record to see what's happening agent evaluations judges give you a written rationale for why your answer is correct or incorrect In this case it's identified that the ground truth the facts that we saw earlier required three things but these are not present in the retrieved context you can look through the assessments from the other judges as well and inspect each input and output in this case I notice that I'm only retrieving one document from my retriever maybe if I increase the number of documents that are coming back I can improve quality let's go ahead and try that out I'll come down here and I'll change the configuration of my agent to return Five results rather than one let's go ahead and kick this off now that evaluation is finished let's go ahead back to the ml UI to compare the results I'll choose the previous experiment and I can see that changing the value of K from 1 to 5 LED to a 17% increase in quality and the root cause of retrieval has gone down I can inspect each individual record and see a side by-side comparison to help me figure out what quality issue I need to fix next since I'm only at 70% quality I still have a bit of a way to go but for the purpos of this demo I'll wrap it up here from here you can continue to iterate on the quality of your agent eventually deploying it to a production ready highly scalable API as well as our web-based chat UI to collect stakeholder feedback using agent Frameworks single line of code to deploy in summary agent evaluation synthetic data capabilities allow you to accelerate your time to Market by reducing the amount of time that spent labeling data it enables you to deliver higher Roi with less cost due to this synthetic data being able to help you increase quality at a faster Pace these capabilities are available today in public preview as well a link to this demo notebook if you want to try it yourself in the video comments we're excited to see what you build thanks for watching

Original Description

Discover how the new Synthetic Data Generation API in Mosaic AI Agent Evaluation transforms the way you evaluate and improve AI agent quality. In this quick demo, Eric Peter, GenAI PM, shows how to: Create evaluation datasets in minutes. Improve agent performance without waiting on Subject Matter Experts (SMEs). Identify and fix low-quality outputs faster than ever. Say goodbye to evaluation bottlenecks and unlock faster innovation for your organization. Watch now to see how Mosaic AI can accelerate your path to production! Read the blog to learn more: https://www.databricks.com/blog/streamline-ai-agent-evaluation-with-new-synthetic-data-capabilities Check out our documentation here: https://docs.databricks.com/en/generative-ai/agent-evaluation/synthesize-evaluation-set.html

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Databricks · Databricks · 28 of 60

← Previous Next →

Building AI Agent Systems with Databricks

Building AI Agent Systems with Databricks

Databricks Workflows

Databricks Workflows

Automate Unity Catalog Upgrade with UCX Part 1: Overview

Automate Unity Catalog Upgrade with UCX Part 1: Overview

Automate Unity Catalog Upgrade with UCX Part 2: Installation

Automate Unity Catalog Upgrade with UCX Part 2: Installation

Automate Unity Catalog Upgrade with UCX Part 3 - Assessment

Automate Unity Catalog Upgrade with UCX Part 3 - Assessment

Automate Unity Catalog Upgrade with UCX Part 4 - Group Migration

Automate Unity Catalog Upgrade with UCX Part 4 - Group Migration

Table Migration and Catalog Design with UCX | Part 5

Table Migration and Catalog Design with UCX | Part 5

Setting Up Azure Access for UCX Table Migration | Part 6

Setting Up Azure Access for UCX Table Migration | Part 6

UCX Table Migration: Creating Catalogs and Schemas | Part 7

UCX Table Migration: Creating Catalogs and Schemas | Part 7

Automate Unity Catalog Upgrade with UCX Part 8: Code Migration

Automate Unity Catalog Upgrade with UCX Part 8: Code Migration

Streaming to Kafka Just Got Easier with DLT Pipelines

Streaming to Kafka Just Got Easier with DLT Pipelines

Data Engineering From Data to Dashboards with DABs: Crunching the Cookies Dataset

Data Engineering From Data to Dashboards with DABs: Crunching the Cookies Dataset

Epsilon helps businesses connect with their consumers using Databricks Data Intelligence Platform

Epsilon helps businesses connect with their consumers using Databricks Data Intelligence Platform

Unilever transforms operations with GenAI using the Databricks Data Intelligence Platform

Unilever transforms operations with GenAI using the Databricks Data Intelligence Platform

ActionIQ enables businesses to unlock customer data with the Databricks Data Intelligence Platform

ActionIQ enables businesses to unlock customer data with the Databricks Data Intelligence Platform

Mixed Attention & LLM Context | Data Brew | Episode 35

Mixed Attention & LLM Context | Data Brew | Episode 35

Inside Databricks SQL: Engineering innovation with Hans

Inside Databricks SQL: Engineering innovation with Hans

Inside Databricks: Engineering innovation with Michael Armbrust

Inside Databricks: Engineering innovation with Michael Armbrust

The Money Team at Databricks: driving revenue and customer growth

The Money Team at Databricks: driving revenue and customer growth

Unity Catalog unveiled: engineering data governance at scale

Unity Catalog unveiled: engineering data governance at scale

Create a view in Databricks and share it with Power BI using Delta Sharing

Create a view in Databricks and share it with Power BI using Delta Sharing

NDUS leverages Databricks Data Intelligence Platform to revolutionize higher education management

NDUS leverages Databricks Data Intelligence Platform to revolutionize higher education management

Démo Databricks de AI/BI

Démo Databricks de AI/BI

EMEA Data + AI World Tour 2024

EMEA Data + AI World Tour 2024

GenAI: The Shift to Data Intelligence - Customer Panel on Industry Use Cases

GenAI: The Shift to Data Intelligence - Customer Panel on Industry Use Cases

GenAI: The Shift to Data Intelligence - Ft. Ash Jhaveri, VP of Reality Labs Partnerships at Meta

GenAI: The Shift to Data Intelligence - Ft. Ash Jhaveri, VP of Reality Labs Partnerships at Meta

Virtue Foundation leverages the Databricks Data Intelligence Platform to advance global health

Virtue Foundation leverages the Databricks Data Intelligence Platform to advance global health

Announcing Synthetic Data Generation in Mosaic AI Agent Evaluation

Announcing Synthetic Data Generation in Mosaic AI Agent Evaluation

AI/BI Dashboards Embedding - A tutorial

AI/BI Dashboards Embedding - A tutorial

Bayer transforms global data management with the Databricks Data Intelligence Platform

Bayer transforms global data management with the Databricks Data Intelligence Platform

Databricks at AWS re:Invent 2024

Databricks at AWS re:Invent 2024

Hive Metastore and AWS Glue Federation in Unity Catalog

Hive Metastore and AWS Glue Federation in Unity Catalog

Data + AI World Tour Paris 2024

Data + AI World Tour Paris 2024

Retail reimagined: Currys data-first strategy to driving growth and improving operations

Retail reimagined: Currys data-first strategy to driving growth and improving operations

Mixture of Memory Experts (MoME) | Data Brew | Episode 36

Mixture of Memory Experts (MoME) | Data Brew | Episode 36

Verana Health Data Curation and Innovation with Databricks and AWS

Verana Health Data Curation and Innovation with Databricks and AWS

Securing SaaS Applications: Obsidian Security on Their Journey with Databricks and AWS

Securing SaaS Applications: Obsidian Security on Their Journey with Databricks and AWS

Twilio Eng VP on Data Intelligence & AI at AWS re:Invent 2024

Twilio Eng VP on Data Intelligence & AI at AWS re:Invent 2024

Chegg Eng SVP on Data-Driven Approach to Student Success with Databricks and AWS

Chegg Eng SVP on Data-Driven Approach to Student Success with Databricks and AWS

Ibotta Personalized Rewards Innovation with Databricks and AWS

Ibotta Personalized Rewards Innovation with Databricks and AWS

Simplify AI governance with #databricks AI Gateway

Simplify AI governance with #databricks AI Gateway

Databricks SQL and Power BI Integration

Databricks SQL and Power BI Integration

Databricks Serverless SQL Warehouses

Databricks Serverless SQL Warehouses

7 West powers audience growth with the Databricks Data Intelligence Platform

7 West powers audience growth with the Databricks Data Intelligence Platform

Secret to Production AI: Tools & Infrastructure | Data Brew | Episode 37

Secret to Production AI: Tools & Infrastructure | Data Brew | Episode 37

Skyflow CEO on Data Privacy with Databricks at AWS re:Invent

Skyflow CEO on Data Privacy with Databricks at AWS re:Invent

Databricks Clean Rooms Product Demo

Databricks Clean Rooms Product Demo

Dun & Bradstreet Enrichment & Monitoring, powered by Delta Sharing & Databricks Marketplace

Dun & Bradstreet Enrichment & Monitoring, powered by Delta Sharing & Databricks Marketplace

Unpacking Libraries in Databricks

Unpacking Libraries in Databricks

Providence uses an AI agent system from Databricks to help doctors improve their communication

Providence uses an AI agent system from Databricks to help doctors improve their communication

How State Street Uses AI to Transform Millions of Trades Daily

How State Street Uses AI to Transform Millions of Trades Daily

Vevo Therapeutics CEO on Curing Disease with Data at AWS re:Invent

Vevo Therapeutics CEO on Curing Disease with Data at AWS re:Invent

Over Architected with Nick & Holly: Databricks updates for Feb 2025

Over Architected with Nick & Holly: Databricks updates for Feb 2025

The Power of Synthetic Data | Data Brew | Episode 38

The Power of Synthetic Data | Data Brew | Episode 38

Use Databricks Lakehouse Federation to break down data silos

Use Databricks Lakehouse Federation to break down data silos

AI's rugby score: National Rugby League rallies fans with analytics and unified data

AI's rugby score: National Rugby League rallies fans with analytics and unified data

Open Variant Data Type in Delta Lake and Apache Spark

Open Variant Data Type in Delta Lake and Apache Spark

How would you sort Ætheldred in the alphabet using Databricks?

How would you sort Ætheldred in the alphabet using Databricks?

A guide on how to operationalize the Databricks AI Security Framework (DASF)

A guide on how to operationalize the Databricks AI Security Framework (DASF)

Future-Proof Your Asset Performance Management with Generative AI - Field Assistant Live Demo

Future-Proof Your Asset Performance Management with Generative AI - Field Assistant Live Demo

The Synthetic Data Generation API in Mosaic AI Agent Evaluation enables rapid creation of evaluation datasets, allowing for faster improvement of AI agent quality and reduction of evaluation bottlenecks.

Key Takeaways

Create a new evaluation dataset using the Synthetic Data Generation API
Configure the API to generate synthetic data
Evaluate AI agent performance using the generated dataset
Identify and fix low-quality outputs
Refine the evaluation dataset as needed

💡 The Synthetic Data Generation API can significantly accelerate the evaluation and improvement of AI agent quality, reducing reliance on Subject Matter Experts and unlocking faster innovation.

🔒 Pro feature: Ask AI to explain this lesson →

More on: Tool Use & Function Calling

View skill →

Adding a Phone Gateway to a Virtual Agent

Administering an AlloyDB Database

Cloud Storage: Qwik Start - CLI/SDK

Cloud Composer: Copying BigQuery Tables Across Different Locations

Getting started with Firebase Cloud Firestore

Getting Started with Liquid to Customize the Looker User Experience

Related AI Lessons

Had my Frontend Developer interview with Capgemini (Application Developer) today, and I wanted to…

Prepare for a frontend developer interview with Capgemini by reviewing JavaScript fundamentals and practicing common interview questions

Medium · JavaScript

10 Frontend Developer Tools to Boost Productivity in 2026

Boost frontend productivity with 10 essential tools for modern web app development

Medium · Programming

10 Frontend Developer Tools to Boost Productivity in 2026

Boost frontend productivity with top 10 developer tools in 2026

Medium · JavaScript

The US Frontend Engineer Market in 2026: A Data-Driven Reality Check (and the Bias That Stops Us Seeing It)

US frontend engineer hiring demand peaked in 2022 and remains flat-depressed in 2026, contrary to common assumptions

The masks we wear | Zora Krstić | TEDxLuxembourgCity