MSLID-TCN: multi-stage linear-index dilated temporal convolutional network for temporal action segmentation

Yükleniyor...
Küçük Resim

Tarih

2024

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Springer Heidelberg

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

Temporal Convolutional Network (TCN) has received extensive attention in the field of speech synthesis. Many researchers use TCN-based models for action segmentation since both tasks focus on contextual connections. However, TCN can only capture the long-term dependencies of information and ignores the short-term dependencies, which can lead to over-segmentation by dividing a single action interval into multiple action categories. This paper proposes Multi-Stage Linear-Index Dilated TCN (MSLID-TCN) model each of whic layer has an appropriate receptive field, allowing the video's short-term and long-term dependencies to be passed to the next layer, thereby optimizing the over-segmentation problem. MSLID-TCN has a four-stage structure. The first stage is a LID-TCN, while the remaining stages are Single Stage TCNs (SS-TCNs). The I3D feature of the video is used as the input for MSLID-TCN. In the first stage, LID-TCN makes initial predictions on frame features to obtain predicted probability values. In the last three stages, these probability features are used as input to the network where SS-TCN corrects the predicted probability values from the previous stage, ultimately yielding action segmentation results. Comparative experiments show that our model performs excellently on the three datasets: 50salads, Georgia Tech Egocentric Activities (GTEA), and Breakfast.

Açıklama

Anahtar Kelimeler

Temporal action segmentation, Temporal convolutional network, Multi-stage temporal convolutional, Deep learning

Kaynak

International Journal of Machine Learning and Cybernetics

WoS Q Değeri

N/A

Scopus Q Değeri

Q1

Cilt

Sayı

Künye

Gao, S., Wu, R., Liu, S., Erkan, U., Toktas, A., Liu, J., & Tang, X. (2024). MSLID-TCN: multi-stage linear-index dilated temporal convolutional network for temporal action segmentation. International Journal of Machine Learning and Cybernetics, 16(1), 567–581. https://doi.org/10.1007/s13042-024-02251-y