Beyond Code Coverage: Functionality Testing with Playwright — Marlene Mhangami, Microsoft

AI Engineer · Intermediate ·🧠 Large Language Models ·1mo ago

Skills: LLM Engineering80%Tool Use & Function Calling60%

Key Takeaways

Demonstrates functionality testing with Playwright and LLMs

Original Description

When an LLM writes your tests, it tends to write tests that confirm what the code does rather than tests that verify what the user experiences. Your test suite goes green. The app still breaks in ways none of those tests would catch. Marlene Mhangami from Microsoft makes the case for flipping the order: get the agent to write failing Playwright tests against the expected behavior first, then generate code to pass them. The demo runs this live with GitHub Copilot and the Playwright MCP server on a toy store search feature, with the browser open so you can watch the agent click through filters and validate results in real time. Speaker info: - https://x.com/marlene_zw - https://www.linkedin.com/in/marlenemhangami/ - https://github.com/marlenemhangami

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: LLM Engineering

View skill →

Build an LLM and RAG-based Chat Application using AlloyDB and LangChain

FULLY LOCAL Mistral AI PDF Processing [Hands-on Tutorial]

FULLY LOCAL Mistral AI PDF Processing [Hands-on Tutorial]

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Ultimate Guide: Deploy Google ADK Agents to Vertex AI & Cloud Run (Step-by-Step Tutorial)

Ultimate Guide: Deploy Google ADK Agents to Vertex AI & Cloud Run (Step-by-Step Tutorial)

Shane | LLM Implementation

How to Make an Asteroids Game Bot (LIVE)

How to Make an Asteroids Game Bot (LIVE)

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Automata Learning Lab

Related Reads

New AI tutor achieves 0.71-1.30 SD effect size in Dartmouth course [pdf]

Phosphor, an AI-powered learning platform, achieves significant learning gains by integrating LLM-graded formative assessments into instructional content, increasing student engagement and efficacy

Hacker News (AI)

Guardrails for LLM Apps in Java

Learn to secure LLM apps in Java with guardrails against prompt-injection and data breaches

Dev.to · Puneet Gupta

Guardrails for LLM Apps in Python

Learn to defend LLM apps in Python with guardrails against prompt-injection attacks and improper data handling

Dev.to · Puneet Gupta

Prompt Caching and Cost Control in Python

Learn to control LLM costs in Python using prompt caching and cost control techniques without sacrificing model quality

Dev.to · Puneet Gupta

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)