MSLID-TCN: multi-stage linear-index dilated temporal convolutional network for temporal action segmentation

dc.authoridToktas, Abdurrahim/0000-0002-7687-9061
dc.contributor.authorGao, Suo
dc.contributor.authorWu, Rui
dc.contributor.authorLiu, Songbo
dc.contributor.authorErkan, Uğur
dc.contributor.authorToktaş, Abdurrahim
dc.contributor.authorLiu, Jiafeng
dc.date.accessioned2025-01-12T17:19:40Z
dc.date.available2025-01-12T17:19:40Z
dc.date.issued2024
dc.departmentKaramanoğlu Mehmetbey Üniversitesi
dc.description.abstractTemporal Convolutional Network (TCN) has received extensive attention in the field of speech synthesis. Many researchers use TCN-based models for action segmentation since both tasks focus on contextual connections. However, TCN can only capture the long-term dependencies of information and ignores the short-term dependencies, which can lead to over-segmentation by dividing a single action interval into multiple action categories. This paper proposes Multi-Stage Linear-Index Dilated TCN (MSLID-TCN) model each of whic layer has an appropriate receptive field, allowing the video's short-term and long-term dependencies to be passed to the next layer, thereby optimizing the over-segmentation problem. MSLID-TCN has a four-stage structure. The first stage is a LID-TCN, while the remaining stages are Single Stage TCNs (SS-TCNs). The I3D feature of the video is used as the input for MSLID-TCN. In the first stage, LID-TCN makes initial predictions on frame features to obtain predicted probability values. In the last three stages, these probability features are used as input to the network where SS-TCN corrects the predicted probability values from the previous stage, ultimately yielding action segmentation results. Comparative experiments show that our model performs excellently on the three datasets: 50salads, Georgia Tech Egocentric Activities (GTEA), and Breakfast.
dc.description.sponsorshipNational Natural Science Foundation of China [61672190]; National Natural Science Foundation of China [CSC202306120290]; China Scholarship Council (CSC) [2023YFC3305003]; Sub-project of National Key Research and Development Program of China
dc.description.sponsorshipThis research is supported by the National Natural Science Foundation of China, No. 61672190; China Scholarship Council (CSC), No. CSC202306120290; the Sub-project of National Key Research and Development Program of China (No. 2023YFC3305003).
dc.identifier.citationGao, S., Wu, R., Liu, S., Erkan, U., Toktas, A., Liu, J., & Tang, X. (2024). MSLID-TCN: multi-stage linear-index dilated temporal convolutional network for temporal action segmentation. International Journal of Machine Learning and Cybernetics, 16(1), 567–581. https://doi.org/10.1007/s13042-024-02251-y
dc.identifier.doi10.1007/s13042-024-02251-y
dc.identifier.issn1868-8071
dc.identifier.issn1868-808X
dc.identifier.scopus2-s2.0-85196320787
dc.identifier.scopusqualityQ1
dc.identifier.urihttps://doi.org/10.1007/s13042-024-02251-y
dc.identifier.urihttps://hdl.handle.net/11492/10143
dc.identifier.wosWOS:001249630500002
dc.identifier.wosqualityN/A
dc.indekslendigikaynakWeb of Sceince
dc.indekslendigikaynakScopus
dc.institutionauthorErkan, Uğur
dc.institutionauthoridErkan, Uğur/0000-0002-2481-0230
dc.language.isoen
dc.publisherSpringer Heidelberg
dc.relation.ispartofInternational Journal of Machine Learning and Cybernetics
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.snmzKA_20250111
dc.subjectTemporal action segmentation
dc.subjectTemporal convolutional network
dc.subjectMulti-stage temporal convolutional
dc.subjectDeep learning
dc.titleMSLID-TCN: multi-stage linear-index dilated temporal convolutional network for temporal action segmentation
dc.typeArticle

Dosyalar

Orijinal paket
Listeleniyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
İsim:
Tam Metin / Full Text.jpg
Boyut:
1.3 MB
Biçim:
Adobe Portable Document Format