Intraoperative bleeding during Endoscopic Submucosal Dissection (ESD) poses significant risks, demanding precise, real-time localization and continuous monitoring of the bleeding source for effective hemostatic intervention. In particular, endoscopists have to repeatedly flush to clear blood, allowing only milliseconds to identify bleeding sources—an inefficient process that prolongs operations and elevates patient risks.
However, current Artificial Intelligence (AI) methods primarily focus on bleeding region segmentation, overlooking the critical need for accurate bleeding source detection and temporal tracking in the challenging ESD environment, which is marked by frequent visual obstructions and dynamic scene changes. This gap is further widened by the lack of specialized datasets, hindering the development of robust AI-assisted guidance systems.
To address these challenges, we introduce BleedOrigin-Bench, the first comprehensive ESD bleeding source dataset, featuring 1,771 expert-annotated bleeding sources across 106,222 frames from 44 procedures, supplemented with 39,755 pseudo-labeled frames. This benchmark covers 8 anatomical sites and 6 challenging clinical scenarios.
We also present BleedOrigin-Net, a novel dual-stage detection-tracking framework for bleeding source localization in ESD procedures, addressing the complete workflow from bleeding onset detection to continuous spatial tracking. For initial detection, our method integrates a Multi-Domain Confidence-based Frame Memory (MDCFM) module that leverages RGB, HSV, and optical flow features for robust temporal context, combined with Multi-Domain Gated Attention (MDG) for superior onset detection.
For continuous tracking, we employ a pseudo-label enhanced strategy that incorporates feature matching, trajectory prediction, and Kalman filtering to generate dense supervision from sparse annotations, complemented by parameter-efficient LoRA fine-tuning.
We compare our approach with widely-used object detection models (YOLOv11/v12), multimodal large language models, and point tracking methods. Extensive evaluation demonstrates state-of-the-art performance, achieving:
Our work establishes a foundation for AI-assisted bleeding management by enabling prompt surgical intervention through real-time bleeding alerts and bleeding source localization, thereby reducing reliance on repeated water flushing and improving ESD procedural outcomes.
Our code and dataset will be available at: https://szupc.github.io/ESD_BleedOrigin/
Overview of the Motivation and Methodology. (A) Motivation: persistent bleeding obscures the surgical field, necessitating repeated flushing to achieve transient exposure of bleeding sources. This iterative process significantly reduces surgical efficiency, prolongs procedure time, and elevates patient risks, including perforation. A demonstration of this issue is available in our supplementary video. (B) The proposed method identifies the initial bleeding source and maintains robust real-time tracking under dynamic visual challenges, ensuring continuous localization until successful hemostasis.
We selected 485 bleeding video clips from 44 patients for analysis, processed in two stages: (1) initial bleeding frame and bleeding source detection, and (2) subsequent bleeding source tracking. The resulting datasets include: A. Detection set: 66, 896 frames with 485 bleeding time points; B. Tracking set: 1,771 frames with manually annotated bleeding sources and an additional 39,755 frames augmented with bleeding source pseudo-labels; C. The dataset features anatomical diversity: 8 sites (gastric antrum, duodenum, etc.); D. clinical challenges: 6 scenarios (obscured bleeding views, camera jitter, water flushing, etc.)
Visualize our detection and tracking results on a complete ESD video (video length > 10s), and the doctor annotates the bleeding point labels at 30fps. Blue is GT, and green is our predicted result. Our algorithm can cope with a variety of challenges and maintain high robustness.
Original Results
Restart each 30 frames
Restart each 60 frames
Restart each 390 frames
Qualitative comparison of bleeding source tracking with different memory refresh intervals. Blue is GT, and green is our predicted result. We also show the error values in the corresponding frames at different update frequencies.
Visualize our detection and tracking results on a complete ESD video (video length > 10s), and the doctor annotates the bleeding point labels at 30fps. Blue is GT, and green is our predicted result. Our algorithm can cope with a variety of challenges and maintain high robustness.
Low Difficulty
Moderate Difficulty
High Difficulty
We selected three complete videos from a clean field of view to bleeding for the deployment experiment.
During the transition from the non-bleeding phase to the bleeding phase, the non-bleeding status is
indicated by a "Non Bleeding" label displayed in the upper-left corner.
Upon entering the bleeding phase indicated by a "Bleeding" label, the bleeding source is detected and highlighted by a pulsating
red bounding box to draw the clinician's attention to the bleeding source.
(i) Low Difficulty: Even with subtle bleeding, our model achieves timely detection via flashing red alerts and maintains accurate tracking despite rapid visual field changes;
(ii) Moderate Difficulty: Despite visibility degradation from smoke, light reflection, and gentle flushing, the model maintains both timely bleeding onset detection and stable bleeding source tracking;
(iii) High Difficulty: Tracking points may drift during copious flushing, but our memory refresh strategy enables bleeding source reidentification when visual features reappear post-flushing.
BibTex Code Here