Implementation of Speaker Identification and Speaker Emotion Recognition System

Ravi Shankar. D Manjula R. B.

Implementation of Speaker Identification and Speaker Emotion Recognition System

Ravi Shankar. D Manjula R. B.

Keywords: Convolutional Neural Network (CNN), Equal Error rate, Speaker Authentication, MFCC, LSTM.

Abstract

Audio classification incurs unique difficulties in speaker recognition and human emotion detection, which have applicable relevance to the real world. This paper introduces a novel multimodal solution to the two challenges of speaker verification and sentiment detection in a customer service call centre setting. For speaker recognition, utilizing a small subset of the LibriSpeech Library, features are extracted via Mel-frequency cepstral coefficients (MFCCs). A three-layer Long Short-Term Memory (LSTM) architecture using triplet loss for training produces an Equal Error Rate (EER) of 6.89%, demonstrating efficacy and precision. Simultaneously, we also conduct emotion detection on the RAVDESS dataset via CNN to classify eight feelings the emotions proposed by Ekman, plus neutral and relaxed resulting in an F1 score of 0.85. This contribution demonstrates that such deep learning approaches can be applied in the real world for telephone speaker authentication and help centers, as speaker verification and emotion detection provide additional meaning to what is being conveyed.

Published

2024-02-04

Issue

Volume 2024

Section

Regular Issue

This work is licensed under a Creative Commons Attribution 4.0 International License.

Letters in High Energy Physics (LHEP) is an open access journal. The articles in LHEP are distributed according to the terms of the creative commons license CC-BY 4.0. Under the terms of this license, copyright is retained by the author while use, distribution and reproduction in any medium are permitted provided proper credit is given to original authors and sources.

Terms of Submission

By submitting an article for publication in LHEP, the submitting author asserts that:

1. The article presents original contributions by the author(s) which have not been published previously in a peer-reviewed medium and are not subject to copyright protection.

2. The co-authors of the article, if any, as well as any institution whose approval is required, agree to the publication of the article in LHEP.

Implementation of Speaker Identification and Speaker Emotion Recognition System

Abstract

Terms of Submission

Call for Papers

Downloads