A spatial attentive and temporal dilated (SATD) GCN for skeleton-based action recognition

Abstract

Current studies have shown that the spatial-temporal graph convolutional network (ST-GCN) is effective for skeleton-based action recognition. However, for the existed ST-GCN based methods, their temporal kernel size is usually fixed over all layers, which making them cannot fully exploit the temporal dependency between discontinuous frames and different sequence length. In addition, most of these methods use average pooling to obtain global graph feature from vertex features, resulting in a loss of much fine-grained information for action classification. To address these issues, in this work, we propose a novel spatial attentive and temporal dilated graph convolutional network (SATD-GCN). It contains two important components, ie, a spatial attention pooling module (SAP) and a temporal dilated graph convolution module (TDGC). Specifically, the SAP module can select the human body joints which are beneficial for action recognition by a self-attention mechanism, and alleviate the influence of data redundancy and noise. The TDGC module can effectively extract the temporal features of different time scales, which is useful to improve the temporal perception field and enhance the robustness of the model to different motion speed and sequence length. Importantly, both the SAP module and the TDGC module can be easily integrated into the ST-GCN based models, and significantly improve their performance. Extensive experiments on two largescale datasets, ie, NTU-RGB+ D and Kinetics-Skeleton, demonstrate that our method achieves the state-of-the-art performance for skeleton-based action recognition.

Jinlu Zhang
Jinlu Zhang
Master student in Computer Vision

My current research interests include 3D human pose estimation, human motion, and video understanding.