Multimodal transformer augmented fusion for speech emotion recognition
Speech emotion recognition is challenging due to the subjectivity and ambiguity of emotion.In recent years, multimodal methods for speech emotion recognition have achieved promising results.However, due to the heterogeneity of data from different modalities, effectively integrating different modal jumbo wind gold information remains a difficulty an