Detailed Information

Cited 0 time in webofscience Cited 3 time in scopus
Metadata Downloads

MediViSTA: Medical Video Segmentation via Temporal Fusion SAM Adaptation for Echocardiography

Authors
Kim, SekeunJin, PengfeiChen, ChengKim, KyungsangLyu, ZhiliangRen, HuiKim, SunghwanLiu, ZhengliangZhong, AoxiaoLiu, TianmingLi, XiangLi, Quanzheng
Issue Date
2025
Publisher
Institute of Electrical and Electronics Engineers Inc.
Keywords
Echocardiography; Parameter-efficient fine-tuning; Segment Anything Model; Segmentation; Vision Foundation model
Citation
IEEE Journal of Biomedical and Health Informatics
Indexed
SCIE
SCOPUS
Journal Title
IEEE Journal of Biomedical and Health Informatics
URI
https://scholarworks.gnu.ac.kr/handle/sw.gnu/77217
DOI
10.1109/JBHI.2025.3540306
ISSN
2168-2194
2168-2208
Abstract
Despite achieving impressive results in general-purpose semantic segmentation with strong generalization on natural images, the Segment Anything Model (SAM) has shown less precision and stability in medical image segmentation. In particular, the SAM architecture is designed for 2D natural images and is therefore not support to handle three-dimensional information, which is particularly important for medical imaging modalities that are often volumetric or video data. In this paper, we introduce MediViSTA, a parameter-efficient fine-tuning method designed to adapt the vision foundation model for medical video, with a specific focus on echocardiography segmentation. To achieve spatial adaptation, we propose a frequency feature fusion technique that injects spatial frequency information from a CNN branch. For temporal adaptation, we integrate temporal adapters within the transformer blocks of the image encoder. Using a fine-tuning strategy, only a small subset of pre-trained parameters is updated, allowing efficient adaptation to echocardiography data. The effectiveness of our method has been comprehensively evaluated on three datasets, comprising two public datasets and one multi-center in-house dataset. Our method consistently outperforms various state-of-the-art approaches without using any prompts. Furthermore, our model exhibits strong generalization capabilities on unseen datasets, surpassing the second-best approach by 2.15% in Dice and 0.09 in temporal consistency. The results demonstrate the potential of MediViSTA to significantly advance echocardiography video segmentation, offering improved accuracy and robustness in cardiac assessment applications. © 2013 IEEE.
Files in This Item
There are no files associated with this item.
Appears in
Collections
College of Medicine > Department of Medicine > Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetrics

Total Views & Downloads

BROWSE