Home | Contact Info | Directory | About SCS | SCS New Buildings Info |   Giving to SCS   SCS Dean's Advisory Board
CALENDAR OF EVENTS

 

 SCS Calendar Events

 Search for Events by Date

 Submit an Event to the SCS Calendar

When: Thursday, June 05, 11:00 a.m.

Where: 1305 Newell-Simon Hall

Konstantin Markov, ATR, Japan

Faculty Candidate Talk: Konstantin Markov

Abstract:
This talk consists of two parts which present the ATR research activities in the following two areas:

1) Design and implementation of structured statistical ASR models. Current statistical approach to model building is often called "ignorance-based modeling" in the sense that any unwanted variability is assumed residual and is supposed to be accommodated within the variances of the probability density functions (pdfs). This allows limited degree of prior model structure to be imposed. However, structure that explains systematic variations reduces the uncertainty which in turn increases the predictability and therefore, the model's performance. Bayesian Networks (BN) are an excellent tool which can efficiently and flexibly encode any structure through their topology. Thus, Dynamic BNs (DBN) became a popular model in speech recognition. But it soon turned out that it's difficult to build large systems because of the DBN's poor scalability. In our method, we tried to achieve the best trade-off between the superior expressive power of the BNs and the practical efficiency of the HMMs. The approach is to keep the hierarchical structure of the traditional HMM based systems and use different, small BNs to model pdfs at different hierarchical levels independently. For example, at the lowest level, we use the BN to represent the HMM state pdf. At the next (phonetic) model level, we use the BN to factor the underlying pdf in a way that allows complex models to be composed by several simpler ones. Word pronunciation variability can also be modeled by BN. We will describe several examples of ASR models built using this approach and show that consistent performance improvement can be achieved in various tasks and settings.

2) Towards cognitive speech processing and recognition. Despite the substantial progress of the ASR technology and the emergence of speech enabled products and services, it is generally acknowledged that we have a long way to go before developing an ASR system that exhibits performance approaching or exceeding that of humans. Some researchers believe that simply extending our current theories and practical solutions may never lead to that goal. In the search for new research directions, it is natural to turn our attention to the human speech recognition. Although we don't know much about it, there is an apparent discrepancy between the way humans acquire their (language) knowledge and the way we train our ASR models. Humans are "learning machines" while the current ASR systems are in fact "learned machines". Developing systems which are able to learn and reason in a continuing loop is one of the key goals of the emerging cognitive information processing research field. We will present our research activities in developing a cognitive speech processing (unified for both recognition and synthesis) system based on the bio-inspired Never-Ending learning principle. This system has hierarchical structure where each layer works according to the same algorithm, but represents different space or level of abstraction. The lowest layer corresponds to the acoustic space and the other layers - to phonetic, word and phrase spaces respectively. The information between layers flows in both directions: bottom-up during recognition and top-down during generation, i.e. synthesis. So far, only the bottom layer has been implemented and experimented with. Development of the full system poses multiple research challenges and problems and we will discuss some ideas for their possible solutions.

<< Back