Recent studies have shown that pseudo labels can contribute to unsupervised domain adaptation (UDA) for speaker verification. The impact of speaker familiarity is also addressed, and the results show a reduced performance for speaker recognition by CI subjects using their CI processor, highlighting limitations of current speech processing strategies used in CIs/HAs. A one-way analysis of variance showed that the proposed system can reliably predict the speaker ID capability of CI ( F = 0.18, p = 0.68) and NH ( F = 0, p = 0.98) listeners in naturalistic environments. To validate the proposed system, perceptual speaker ID for 20 normal hearing (NH) and seven CI listeners was evaluated with a total of 41 different speakers and compared with the scores from the proposed system. Features are extracted from electrodograms through an identity vector (i-vector) framework to train and generate identity models for each speaker using a Gaussian mixture model-universal background model followed by probabilistic linear discriminant analysis. Motivated by the fact that electrodograms reflect direct CI stimulation of input audio, this study proposes a speaker identification (ID) investigation using two-dimensional electrodograms constructed from the responses of a CI auditory system to emulate CI speaker ID capabilities. In the area of speech processing, human speaker identification under naturalistic environments is a challenging task, especially for hearing-impaired individuals with cochlear implants (CIs) or hearing aids (HAs). We conclude this review with a comparative- study of human versus machine speaker recognition and attempt to point out strengths and weaknesses of each. Human speaker recognition is discussed in two parts?the first part involves forensic speaker-recognition methods, and the second illustrates how a na?ve listener performs this task from a neuroscience perspective. We discuss different aspects of automatic systems, including voice-activity detection (VAD), features, speaker models, standard evaluation data sets, and performance metrics. In this article, we review the literature on speaker recognition by machines and humans, with an emphasis on prominent speaker-modeling techniques that have emerged in the last decade for automatic systems. Experienced researchers in signal processing and machine learning continue to develop automatic algorithms to effectively perform speaker recognition?with ever-improving performance?to the point where automatic systems start to perform on par with human listeners. Techniques in forensic speaker recognition have been developed for many years by forensic speech scientists and linguists to help reduce any potential bias or preconceived understanding as to the validity of an unknown audio sample and a reference template from a potential suspect. Human experts trained in forensic speaker recognition can perform this task even better by examining a set of acoustic, prosodic, and linguistic characteristics of speech in a general approach referred to as structured listening. Automatic speaker-recognition systems have emerged as an important means of verifying identity in many e-commerce applications as well as in general business interactions, forensics, and law enforcement. Speaking to someone over the telephone usually begins by identifying who is speaking and, at least in cases of familiar speakers, a subjective verification by the listener that the identity is correct and the conversation can proceed. Identifying a person by his or her voice is an important human trait most take for granted in natural human-to-human interaction/communication.
0 Comments
Leave a Reply. |