Seeking Physics in Diffusion Noise

📰 ArXiv cs.AI

Researchers find that video diffusion models can encode signals predictive of physical plausibility, allowing for partial separation of plausible and implausible videos in feature space

advanced Published 27 Mar 2026

Action Steps

Analyze intermediate denoising representations of a pretrained Diffusion Transformer (DiT)
Probe mid-layer feature space across noise levels to identify separability of physically plausible and implausible videos
Investigate whether separability can be attributed to visual quality or generator identity
Explore recoverable physics-related cues in frozen diffusion models

Who Needs to Know This

AI engineers and researchers working on computer vision and diffusion models can benefit from this study, as it provides insights into the capabilities and limitations of these models

Key Insight

💡 Diffusion models can capture physically plausible signals, even in noise

Key Takeaways

Researchers find that video diffusion models can encode signals predictive of physical plausibility, allowing for partial separation of plausible and implausible videos in feature space

Full Article

Title: Seeking Physics in Diffusion Noise

Abstract:
arXiv:2603.14294v2 Announce Type: replace-cross Abstract: Do video diffusion models encode signals predictive of physical plausibility? We probe intermediate denoising representations of a pretrained Diffusion Transformer (DiT) and find that physically plausible and implausible videos are partially separable in mid-layer feature space across noise levels. This separability cannot be fully attributed to visual quality or generator identity, suggesting recoverable physics-related cues in frozen Di

Read full paper → ← Back to Reads

Seeking Physics in Diffusion Noise

Key Takeaways

Full Article

Related Videos