Dynamic Speech Models (Synthesis Lectures on Speech and Audio Processing)

Dynamic Speech Models (Synthesis Lectures on Speech and Audio Processing), 9781598290646 (1598290649), Morgan and Claypool Publishers, 2006

In a broad sense, speech dynamics are time-varying or temporal characteristics in all stages of the human speech communication process. This process, sometimes referred to as speech chain [1], starts with the formation of a linguistic message in the speaker’s brain and ends with the arrival of the message in the listener’s brain. In parallel with this direct information transfer, there is also a feedback link from the acoustic signal of speech to the speaker’s ear and brain. In the conversational mode of speech communication, the style of the speaker’s speech can be further influenced by an assessment of the extent to which the linguistic message is successfully transferred to or understood by the listener. This type of feedbacks makes the speech chain a closed-loop process.

What are the compelling reasons for carrying out dynamic speech modeling? We provide the answer in two related aspects. First, scientific inquiry into the human speech code has been relentlessly pursued for several decades. As an essential carrier of human intelligence and knowledge, speech is the most natural form of human communication. Embedded in the speech code are linguistic (as well as para-linguistic) messages, which are conveyed through four levels of the speech chain. Underlying the robust encoding and transmission of the linguistic messages are the speech dynamics at all the four levels. Mathematical modeling of speech dynamics provides an effective tool in the scientific methods of studying the speech chain. Such scientific studies help understand why humans speak as they do and how humans exploit redundancy and variability by way of multitiered dynamic processes to enhance the efficiency and effectiveness of human speech communication. Second, advancement of human language technology, especially that in automatic recognition of natural-style human speech is also expected to benefit from comprehensive computational modeling of speech dynamics. The limitations of current speech recognition technology are serious and are well known. A commonly acknowledged and frequently discussed weakness of the statistical model underlying current speech recognition technology is the lack of adequate dynamic modeling schemes to provide correlation structure across the temporal speech observation sequence. Unfortunately, due to a variety of reasons, the majority of current research activities in this area favor only incremental modifications and improvements to the existing HMM-based state-of-the-art. For example, while the dynamic and correlation modeling is known to be an important topic, most of the systems nevertheless employ only an ultra-weak form of speech dynamics; e.g., differential or delta parameters. Strong-form dynamic speech modeling, which is the focus of this monograph, may serve as an ultimate solution to this problem. After the introduction chapter, the main body of this monograph consists of four chapters. They cover various aspects of theory, algorithms, and applications of dynamic speech models, and provide a comprehensive survey of the research work in this area spanning over past 20~years. This monograph is intended as advanced materials of speech and signal processing for graudate-level teaching, for professionals and engineering practioners, as well as for seasoned researchers and engineers specialized in speech processing.

Comments

Amazing Books

Adobe Creative Suite 3 Web Premium All-in-One Desk Reference For Dummies (Computer/Tech)

For Dummies, 2007

The future looks bright for Web developers! With Adobe Creative Suite 3 Web Premium, you now have fantastic tools that enable you to create a wide array of content and graphics to deploy on a Web site that is dynamic and easy to maintain.

Adobe Creative Suite 3 Web Premium All-in-One Desk Reference For Dummies is a hands-on,...

The Rohonc Code: Tracing a Historical Riddle

Penn State University Press, 2021

First discovered in a Hungarian library in 1838, the Rohonc Codex keeps privileged company with some of the most famous unsolved writing systems in the world, notably the Voynich manuscript, the Phaistos Disk, and Linear A. Written entirely in cipher, this 400-year-old, 450-page-long, richly illustrated manuscript initially gained...

Transitioning to Swift

Apress, 2014

Developing apps for Apple’s broadening platform of devices is an exciting topic these days. Apple created the Swift programming language to build state-of-the-art apps using the latest Apple technologies.

In this 200-page book, author Scott Gardner articulates the similarities and differences between traditional Objective-C...

Engineering a Compiler

Morgan Kaufmann, 2003

The proliferation of processors, environments, and constraints on systems has cast compiler technology into a wider variety of settings, changing the compiler and compiler writer's role. No longer is execution speed the sole criterion for judging compiled code. Today, code might be judged on how small it is, how much power it consumes, how well it...

An Introduction to Fuzzy Sets: Analysis and Design (Complex Adaptive Systems)

MIT Press, 1998

The concept of fuzzy sets is one of the most fundamental and influential tools in computational intelligence. Fuzzy sets can provide solutions to a broad range of problems of control, pattern classification, reasoning, planning, and computer vision. This book bridges the gap that has developed between theory and practice. The authors explain...

The Quantum Mechanics of Minds and Worlds

Oxford University Press, 2001

The book is at its best when it is distinguishing between the various versions of the Everett interpretation, and would certainly be useful to anyone who whishes to pursue Everett's approach. Barrett wisely separates out what can be reasonably ascribed to Everett, and what work remains to turn Everetts writings into a complete interpretation....