Score-Based One-step MeanFlow Policy Optimization
📰 ArXiv cs.AI
arXiv:2605.23365v1 Announce Type: cross Abstract: Diffusion and flow matching have emerged as expressive policy classes in reinforcement learning, but their reliance on multi-step denoising imposes substantial computational overhead at inference time, which is particularly problematic in online RL. MeanFlow offers a promising alternative by learning an average velocity field that maps noise to data in a single network evaluation. However, MeanFlow typically requires samples from the target distr
DeepCamp AI