Wi-Fi-enabled Vision via Spatially-variant Pose Estimation based on Convolutional Transformer Network
Citations

WEB OF SCIENCE

1
Citations

SCOPUS

1

초록

Wi-Fi-enabled vision offers a transformative paradigm for non-optical pose estimation, particularly in occluded or privacy-sensitive environments where traditional visual systems falter. Despite its promise, extracting reliable pose information from Wi-Fi Channel State Information (CSI) remains a formidable challenge due to spatial variability in torso localization, cross-view discrepancies, and inherent signal perturbations caused by multipath propagation and environmental noise. To address these challenges, we propose a Convolutional Transformer Network, an architecture that integrates convolutional layers for localized spatial feature extraction and transformer layers for global temporal dependency modeling. This integrative design effectively captures the spatiotemporal dynamics of CSI signals, enabling robust pose estimation under cross-view and spatially-variant conditions. When evaluated on the benchmark WIDAR 3.0 datasets, the proposed model outperforms the structural and sequential learning baseline CNN-GRU by 1.72% in accuracy. It outperforms sequential models (RNN, GRU, LSTM) and image models (CNN, ViT) across all key metrics, demonstrating robust spatial-temporal modeling capabilities. These results highlight its advancement in non-optical pose estimation and practical applicability in real-world scenarios. © 2013 IEEE.

키워드

Convolutional transformer networkNon-optical human activity recognitionPose estimationSignal classificationWi-Fi visionHUMAN ACTIVITY RECOGNITIONCSIARCHITECTURE
제목
Wi-Fi-enabled Vision via Spatially-variant Pose Estimation based on Convolutional Transformer Network
저자
Lee, Hyeon-JuBuu, Seok-Jun
DOI
10.1109/ACCESS.2025.3568505
발행일
2025-05
유형
Article
저널명
IEEE Access
13
페이지
84855 ~ 84869