Type of Document Dissertation Author Yoo, Sungyub Author's Email Address sungyoo@pitt.edu URN etd-07012005-135056 Title Speech Decomposition and Enhancement Degree Doctor of Philosophy Program Electrical Engineering School School of Engineering Advisory Committee
Advisor Name Title J. Robert Boston Committee Chair Amro A. El-Jaroudi Committee Member Ching-Chung Li Committee Member Heung-no Lee Committee Member John D. Durrant Committee Member Keywords
- non-tonal
- tonal
- transition
- speech
- speech decomposition
- speech enhancement
- format
Date of Defense 2005-06-29 Availability unrestricted Abstract The goal of this study is to investigate the roles of steady-state speech sounds and transitions between these sounds in the intelligibility of speech. The motivation for this approach is that the auditory system may be particularly sensitive to time-varying frequency edges, which in speech are produced primarily by transitions between vowels and consonants and within vowels. The possibility that selectively amplifying these edges may enhance speech intelligibility is examined.Computer algorithms to decompose speech into two different components were developed. One component, which is defined as a tonal component, was intended to predominately include formant activity. The second component, which is defined as a non-tonal component, was intended to predominately include transitions between and within formants.
The approach to the decomposition is to use a set of time-varying filters whose center frequencies and bandwidths are controlled to identify the strongest formant components in speech. Each center frequency and bandwidth is estimated based on FM and AM information of each formant component. The tonal component is composed of the sum of the filter outputs. The non-tonal component is defined as the difference between the original speech signal and the tonal component.
The relative energy and intelligibility of the tonal and non-tonal components were compared to the original speech. Psychoacoustic growth functions were used to assess the intelligibility. Most of the speech energy was in the tonal component, but this component had a significantly lower maximum word recognition than the original and non-tonal component had. The non-tonal component averaged 2% of the original speech energy, but this component had almost equal maximum word recognition as the original speech.
The non-tonal component was amplified and recombined with the original speech to generate enhanced speech. The energy of the enhanced speech was adjusted to be equal to the original speech, and the intelligibility of the enhanced speech was compared to the original speech in background noise. The enhanced speech showed higher recognition scores at lower SNRs, and the differences were significant. The original and enhanced speech showed similar recognition scores at higher SNRs. These results suggest that amplification of transient information can enhance the speech in noise and this enhancement method is more effective at severe noise conditions.
Files
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access dissertation_Yoo.pdf 3.45 Mb 00:15:58 00:08:13 00:07:11 00:03:35 00:00:18 If you have questions or comments please send mail to ETD-Feedback or view
the University of Pittsburgh Electronic Theses and Dissertations (ETD) Project page.