BleedOrigin: Dynamic Bleeding Source Localization in Endoscopic Submucosal Dissection via Dual-Stage Detection and Tracking

Motivation and Method Overview — Overview of BleedOrigin.

Abstract

Intraoperative bleeding during Endoscopic Submucosal Dissection (ESD) poses significant risks, demanding precise, real-time localization and continuous monitoring of the bleeding source for effective hemostatic intervention. In particular, endoscopists have to repeatedly flush to clear blood, allowing only milliseconds to identify bleeding sources—an inefficient process that prolongs operations and elevates patient risks.

However, current Artificial Intelligence (AI) methods primarily focus on bleeding region segmentation, overlooking the critical need for accurate bleeding source detection and temporal tracking in the challenging ESD environment, which is marked by frequent visual obstructions and dynamic scene changes. This gap is further widened by the lack of specialized datasets, hindering the development of robust AI-assisted guidance systems.

To address these challenges, we introduce BleedOrigin-Bench, the first comprehensive ESD bleeding source dataset, featuring 1,771 expert-annotated bleeding sources across 106,222 frames from 44 procedures, supplemented with 39,755 pseudo-labeled frames. This benchmark covers 8 anatomical sites and 6 challenging clinical scenarios.

We also present BleedOrigin-Net, a novel dual-stage detection-tracking framework for bleeding source localization in ESD procedures, addressing the complete workflow from bleeding onset detection to continuous spatial tracking. For initial detection, our method integrates a Multi-Domain Confidence-based Frame Memory (MDCFM) module that leverages RGB, HSV, and optical flow features for robust temporal context, combined with Multi-Domain Gated Attention (MDG) for superior onset detection.

For continuous tracking, we employ a pseudo-label enhanced strategy that incorporates feature matching, trajectory prediction, and Kalman filtering to generate dense supervision from sparse annotations, complemented by parameter-efficient LoRA fine-tuning.

We compare our approach with widely-used object detection models (YOLOv11/v12), multimodal large language models, and point tracking methods. Extensive evaluation demonstrates state-of-the-art performance, achieving:

96.85% frame-level accuracy (± ≤ 8 frames) for bleeding onset detection
70.24% pixel-level accuracy (≤ 100 px) for initial source detection
96.11% pixel-level accuracy (≤ 100 px) for point tracking

Our work establishes a foundation for AI-assisted bleeding management by enabling prompt surgical intervention through real-time bleeding alerts and bleeding source localization, thereby reducing reliance on repeated water flushing and improving ESD procedural outcomes.

Our code and dataset will be available at: https://szupc.github.io/ESD_BleedOrigin/

Motivation and Method Overview

Overview of the Motivation and Methodology. (A) Motivation: persistent bleeding obscures the surgical field, necessitating repeated flushing to achieve transient exposure of bleeding sources. This iterative process significantly reduces surgical efficiency, prolongs procedure time, and elevates patient risks, including perforation. A demonstration of this issue is available in our supplementary video. (B) The proposed method identifies the initial bleeding source and maintains robust real-time tracking under dynamic visual challenges, ensuring continuous localization until successful hemostasis.

BleedOrigin-Bench Dataset Overview

We selected 485 bleeding video clips from 44 patients for analysis, processed in two stages: (1) initial bleeding frame and bleeding source detection, and (2) subsequent bleeding source tracking. The resulting datasets include: A. Detection set: 66, 896 frames with 485 bleeding time points; B. Tracking set: 1,771 frames with manually annotated bleeding sources and an additional 39,755 frames augmented with bleeding source pseudo-labels; C. The dataset features anatomical diversity: 8 sites (gastric antrum, duodenum, etc.); D. clinical challenges: 6 scenarios (obscured bleeding views, camera jitter, water flushing, etc.)