BleedOrigin: Dynamic Bleeding Source Localization in Endoscopic Submucosal Dissection via Dual-Stage Detection and Tracking

Mengya Xu*, Rulin Zhou*, An Wang*, Chaoyang Lyu, Zhen Li, Ning Zhong, Hongliang Ren
Mengya Xu is the project leader.
The Chinese University of Hong Kong;
CUHK Shenzhen Research Institute;
Qilu Hospital of Shandong University;

*Indicates Equal Contribution
Corresponding Author

Project Introduction Video

Motivation and Method Overview
Overview of BleedOrigin.

Abstract

Intraoperative bleeding during Endoscopic Submucosal Dissection (ESD) poses significant risks, demanding precise, real-time localization and continuous monitoring of the bleeding source for effective hemostatic intervention. In particular, endoscopists have to repeatedly flush to clear blood, allowing only milliseconds to identify bleeding sources—an inefficient process that prolongs operations and elevates patient risks.

However, current Artificial Intelligence (AI) methods primarily focus on bleeding region segmentation, overlooking the critical need for accurate bleeding source detection and temporal tracking in the challenging ESD environment, which is marked by frequent visual obstructions and dynamic scene changes. This gap is further widened by the lack of specialized datasets, hindering the development of robust AI-assisted guidance systems.

To address these challenges, we introduce BleedOrigin-Bench, the first comprehensive ESD bleeding source dataset, featuring 1,771 expert-annotated bleeding sources across 106,222 frames from 44 procedures, supplemented with 39,755 pseudo-labeled frames. This benchmark covers 8 anatomical sites and 6 challenging clinical scenarios.

We also present BleedOrigin-Net, a novel dual-stage detection-tracking framework for bleeding source localization in ESD procedures, addressing the complete workflow from bleeding onset detection to continuous spatial tracking. For initial detection, our method integrates a Multi-Domain Confidence-based Frame Memory (MDCFM) module that leverages RGB, HSV, and optical flow features for robust temporal context, combined with Multi-Domain Gated Attention (MDG) for superior onset detection.

For continuous tracking, we employ a pseudo-label enhanced strategy that incorporates feature matching, trajectory prediction, and Kalman filtering to generate dense supervision from sparse annotations, complemented by parameter-efficient LoRA fine-tuning.

We compare our approach with widely-used object detection models (YOLOv11/v12), multimodal large language models, and point tracking methods. Extensive evaluation demonstrates state-of-the-art performance, achieving:

  • 96.85% frame-level accuracy (± ≤ 8 frames) for bleeding onset detection
  • 70.24% pixel-level accuracy (≤ 100 px) for initial source detection
  • 96.11% pixel-level accuracy (≤ 100 px) for point tracking

Our work establishes a foundation for AI-assisted bleeding management by enabling prompt surgical intervention through real-time bleeding alerts and bleeding source localization, thereby reducing reliance on repeated water flushing and improving ESD procedural outcomes.

Our code and dataset will be available at: https://szupc.github.io/ESD_BleedOrigin/

Motivation and Method Overview

Motivation and Method Overview

Overview of the Motivation and Methodology. (A) Motivation: persistent bleeding obscures the surgical field, necessitating repeated flushing to achieve transient exposure of bleeding sources. This iterative process significantly reduces surgical efficiency, prolongs procedure time, and elevates patient risks, including perforation. A demonstration of this issue is available in our supplementary video. (B) The proposed method identifies the initial bleeding source and maintains robust real-time tracking under dynamic visual challenges, ensuring continuous localization until successful hemostasis.

BleedOrigin-Bench Dataset Overview

BleedOrigin-Bench Dataset Overview

We selected 485 bleeding video clips from 44 patients for analysis, processed in two stages: (1) initial bleeding frame and bleeding source detection, and (2) subsequent bleeding source tracking. The resulting datasets include: A. Detection set: 66, 896 frames with 485 bleeding time points; B. Tracking set: 1,771 frames with manually annotated bleeding sources and an additional 39,755 frames augmented with bleeding source pseudo-labels; C. The dataset features anatomical diversity: 8 sites (gastric antrum, duodenum, etc.); D. clinical challenges: 6 scenarios (obscured bleeding views, camera jitter, water flushing, etc.)

Visualize the Prediction Results on 6 Surgical Scenarios

Clear Bleeding View

Obscure Bleeding View

Light Reflection

Water Flushing

Camera Jitter

Instrument Interference

Visualizing the Prediction and GT of Bleeding Source Tracking in Long Videos

Visualize our detection and tracking results on a complete ESD video (video length > 10s), and the doctor annotates the bleeding point labels at 30fps. Blue is GT, and green is our predicted result. Our algorithm can cope with a variety of challenges and maintain high robustness.

<

Visualize the Prediction of Restarting the Memory Module at Different Frames

Original Results

Restart each 30 frames

Restart each 60 frames

Restart each 390 frames

Qualitative comparison of bleeding source tracking with different memory refresh intervals. Blue is GT, and green is our predicted result. We also show the error values in the corresponding frames at different update frequencies.

Visualizing the Prediction and GT of Bleeding Source Tracking in Long Videos

Visualize our detection and tracking results on a complete ESD video (video length > 10s), and the doctor annotates the bleeding point labels at 30fps. Blue is GT, and green is our predicted result. Our algorithm can cope with a variety of challenges and maintain high robustness.

Visualizing the Prediction in Clinical Deployment

Low Difficulty

Moderate Difficulty

High Difficulty

We selected three complete videos from a clean field of view to bleeding for the deployment experiment. During the transition from the non-bleeding phase to the bleeding phase, the non-bleeding status is indicated by a "Non Bleeding" label displayed in the upper-left corner. Upon entering the bleeding phase indicated by a "Bleeding" label, the bleeding source is detected and highlighted by a pulsating red bounding box to draw the clinician's attention to the bleeding source.

(i) Low Difficulty: Even with subtle bleeding, our model achieves timely detection via flashing red alerts and maintains accurate tracking despite rapid visual field changes;
(ii) Moderate Difficulty: Despite visibility degradation from smoke, light reflection, and gentle flushing, the model maintains both timely bleeding onset detection and stable bleeding source tracking;
(iii) High Difficulty: Tracking points may drift during copious flushing, but our memory refresh strategy enables bleeding source reidentification when visual features reappear post-flushing.

BibTeX

BibTex Code Here