Beyond Code Coverage: Functionality Testing with Playwright — Marlene Mhangami, Microsoft

AI Engineer · Intermediate ·🧠 Large Language Models ·1mo ago

Key Takeaways

Demonstrates functionality testing with Playwright and LLMs

Original Description

When an LLM writes your tests, it tends to write tests that confirm what the code does rather than tests that verify what the user experiences. Your test suite goes green. The app still breaks in ways none of those tests would catch. Marlene Mhangami from Microsoft makes the case for flipping the order: get the agent to write failing Playwright tests against the expected behavior first, then generate code to pass them. The demo runs this live with GitHub Copilot and the Playwright MCP server on a toy store search feature, with the browser open so you can watch the agent click through filters and validate results in real time. Speaker info: - https://x.com/marlene_zw - https://www.linkedin.com/in/marlenemhangami/ - https://github.com/marlenemhangami
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related Reads

📰
New AI tutor achieves 0.71-1.30 SD effect size in Dartmouth course [pdf]
Phosphor, an AI-powered learning platform, achieves significant learning gains by integrating LLM-graded formative assessments into instructional content, increasing student engagement and efficacy
Hacker News (AI)
📰
Guardrails for LLM Apps in Java
Learn to secure LLM apps in Java with guardrails against prompt-injection and data breaches
Dev.to · Puneet Gupta
📰
Guardrails for LLM Apps in Python
Learn to defend LLM apps in Python with guardrails against prompt-injection attacks and improper data handling
Dev.to · Puneet Gupta
📰
Prompt Caching and Cost Control in Python
Learn to control LLM costs in Python using prompt caching and cost control techniques without sacrificing model quality
Dev.to · Puneet Gupta
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →