Gao, SuoWu, RuiLiu, SongboErkan, UğurToktaş, AbdurrahimLiu, Jiafeng2025-01-122025-01-122024Gao, S., Wu, R., Liu, S., Erkan, U., Toktas, A., Liu, J., & Tang, X. (2024). MSLID-TCN: multi-stage linear-index dilated temporal convolutional network for temporal action segmentation. International Journal of Machine Learning and Cybernetics, 16(1), 567–581. https://doi.org/10.1007/s13042-024-02251-y1868-80711868-808Xhttps://doi.org/10.1007/s13042-024-02251-yhttps://hdl.handle.net/11492/10143Temporal Convolutional Network (TCN) has received extensive attention in the field of speech synthesis. Many researchers use TCN-based models for action segmentation since both tasks focus on contextual connections. However, TCN can only capture the long-term dependencies of information and ignores the short-term dependencies, which can lead to over-segmentation by dividing a single action interval into multiple action categories. This paper proposes Multi-Stage Linear-Index Dilated TCN (MSLID-TCN) model each of whic layer has an appropriate receptive field, allowing the video's short-term and long-term dependencies to be passed to the next layer, thereby optimizing the over-segmentation problem. MSLID-TCN has a four-stage structure. The first stage is a LID-TCN, while the remaining stages are Single Stage TCNs (SS-TCNs). The I3D feature of the video is used as the input for MSLID-TCN. In the first stage, LID-TCN makes initial predictions on frame features to obtain predicted probability values. In the last three stages, these probability features are used as input to the network where SS-TCN corrects the predicted probability values from the previous stage, ultimately yielding action segmentation results. Comparative experiments show that our model performs excellently on the three datasets: 50salads, Georgia Tech Egocentric Activities (GTEA), and Breakfast.enTemporal action segmentationTemporal convolutional networkMulti-stage temporal convolutionalDeep learningMSLID-TCN: multi-stage linear-index dilated temporal convolutional network for temporal action segmentationArticleinfo:eu-repo/semantics/closedAccess2-s2.0-85196320787WOS:00124963050000210.1007/s13042-024-02251-yQ1N/A