←back to Blog

MedSAM2: Segment Anything in 3D Medical Images and Videos

 MedSAM2 introduces a robust foundation model for promptable segmentation in 3D medical images and temporal video data, built by fine-tuning SAM2.1 on a large-scale curated medical dataset.

Key Highlights:

  • 3D & Video Segmentation Foundation Model – Tailors SAM2.1-Tiny for medical domains, supporting volumetric scans (CT, MRI, PET) and sequential video modalities (ultrasound, endoscopy) with a unified architecture.
  • Memory-Aware Temporal Modeling – Employs a streaming memory attention module with cross-frame conditioning to maintain context across slices or frames, enhancing anatomical continuity and temporal coherence.
  • Promptable via Bounding Boxes – Uses 2D bounding box prompts on the central slice or frame, propagating masks bidirectionally for full 3D or temporal coverage with minimal supervision.
  • Hierarchical Vision Transformer Backbone – Integrates Hiera for efficient multiscale feature extraction, outperforming naive ViTs on medical data in speed and accuracy.
  • Full Fine-Tuning Strategy – Trains all model components end-to-end (encoder, decoder, memory, prompt encoder) to maximize medical domain adaptation.
  • SOTA Performance Across Modalities – Outperforms EfficientMedSAM and SAM2.1 variants in CT/MRI/PET organs and lesion segmentation, including difficult structures like the pancreas and epicardium.
  • Human-in-the-Loop Annotation Boost – Enables >85% annotation time reduction on large-scale datasets (5,000 CT lesions, 3,984 liver MRIs, 251k echo frames) via iterative refinement pipeline.
  • Deployment-Ready Ecosystem – Plug-and-play support for 3D Slicer, Gradio, Google Colab, JupyterLab, and terminal CLI for both local and cloud environments.
  • Open Source – Code, models, and plugins available

Resources

The post MedSAM2: Segment Anything in 3D Medical Images and Videos appeared first on OpenCV.