CliPPER: Contextual Video-Language Pretraining on Long-form Intraoperative Surgical Procedures for Event Recognition

📰 ArXiv cs.AI

CliPPER is a video-language pretraining model for recognizing events in long-form intraoperative surgical procedures

advanced Published 26 Mar 2026

Action Steps

Pretrain CliPPER on a large dataset of long-form intraoperative surgical procedure videos and corresponding transcripts
Fine-tune CliPPER on a smaller dataset of labeled surgical events to adapt the model to specific event recognition tasks
Use CliPPER to recognize events in new, unseen surgical videos and evaluate its performance using metrics such as accuracy and F1-score
Integrate CliPPER into a larger system for surgical workflow analysis and decision support, leveraging its event recognition capabilities to improve patient outcomes

Who Needs to Know This

Members of a research team in AI for healthcare, particularly those working on surgical procedure analysis, can benefit from CliPPER's ability to recognize events in surgical videos, while surgeons and medical professionals can use the model's outputs to improve their workflows and decision-making

Key Insight

💡 CliPPER can effectively recognize events in long-form intraoperative surgical procedures, even with limited labeled data