Niloofar Mireshghallah - Contextual Integrity in LLMs Benchmarking

Name: Niloofar Mireshghallah - Contextual Integrity in LLMs Benchmarking
Uploaded: 2026-04-06T21:07:51Z
Channel: Cohere
Description: Abstract: As large language models integrate into daily workflows—from personal assistants to workplace tools—they handle sensitive information from mul...

Cohere · Advanced ·🧠 Large Language Models ·1mo ago

Skills: LLM Foundations80%

Abstract: As large language models integrate into daily workflows—from personal assistants to workplace tools—they handle sensitive information from multiple sources yet struggle to reason about what to share, with whom, and when. In this talk, we explore critical gaps in LLMs' privacy reasoning through complementary benchmarks. First, ConfAIde [ICLR2024 Spotlight] reveals that even advanced models like GPT-4 inappropriately disclose private information in contexts where humans would maintain boundaries. Second, we extend this analysis, in CIMemories [ICLR2026], to persistent memories—an increasingly adopted personalization feature—showing failures in handling compositional secrets with multiple attributes and contextual cues. We then present a data minimization framework [ICLR 2026] that formally defines the least privacy-revealing disclosure that maintains task utility. Our experiments show frontier models can tolerate up to 85% data redaction without losing functionality, yet they lack awareness of what information they actually need—leading to systematic oversharing. We conclude with techniques for restoring performance when privacy measures are applied, offering a path toward AI systems that respect contextual privacy norms while remaining useful. Niloofar Mireshghallah is a Member of Technical Staff at humans&, working on building AI systems that model the long-term social good of people. Beginning Fall 2026, she will join Carnegie Mellon University as an Assistant Professor jointly appointed in the Language Technologies Institute (LTI) and the Department of Engineering & Public Policy (EPP), and will be a core member of CyLab. Previously, she was a Research Scientist in the Alignment group at Meta's Fundamental AI Research (FAIR) lab until November 2025, working on privacy-preserving AI systems and LLM safety. Before that, she was a post-doctoral scholar at the Paul G. Allen School of Computer Science & Engineering at the University of Washington, advised by

Watch on YouTube ↗ (saves to browser)