Thursday, May 26, 2011

Paper Summary - A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition

This summary also functions as a very rough and basic introduction to HMMs for myself. This accounts for the different format and subsequent length of this entry.

In this paper, Rabiner provides a tutorial on hidden Markov models and how they may be applied to the area of speech recognition. A hidden Markov model (HMM) is a stochastic signal model, meaning that the signals can be well characterized as parametric random processes. Rabiner provides a formal look at discrete Markov processes with the example of predicting the weather for a week. The example is roughly as follows:

Taken from author's paper.
Any day it may be raining, cloudy, or sunny (so 3 possible states). For a given day t, the matrix of state transition probabilities is as shown to the right. Given that it is sunny (state 3) on the first day, we define an observation sequence O as O ={S3, S3, S3, S1, S1, S3, S2, S3}, and wish to determine its probability. As Rabiner states in his paper, the following expression and evaluation shows the probability for this observation sequence:
Taken from author's paper. Determining the probability for the weather.

In this example the states are observable, physical events. Hidden processes, however, often exist in problems that by definition lack observable processes. In such instances the result of a process can be known but not the process itself. Therefore the problem becomes finding an accurate model for the observed sequence of results. This is where building an HMM to explain the observed sequence comes into play. See Rabiner's paper for two initial examples that can be modeled with an HMM.

Three Basic Problems for HMMs
There are three basic problems that must be solved for an HMM to be of use.

Problem 1 - Given a sequence of observations and a model, how do you compute the probability that the model produced the observed sequence? An efficient procedure for solving this problem is known as the forward-backward procedure. This procedure is based on the lattice structure of states. Given N states, each change from one state to another will again result in one of these N states. Therefore, calculating the probability of a partial observation sequence up to a given time can be somewhat reduced to calculating the probabilities along the connections between states for a given time. (Note: You are much better off reading up on the procedure than trusting my summary).

Problem 2 - Given a sequence of observations and a model, how do you get a state sequence which is optimal in a meaningful way? In most instances, you may be interested in finding the single best state sequence. The Viterbi Algorithm uses dynamic programming to do so. Again, a lattice structure effectively illustrates the process. (Note: Read up on the process because I'm not rehashing it here).

Problem 3 - How do you optimize model parameters to best describe a given sequence of observations? Rabiner states that this is the most difficult problem of HMMs. He focuses on the Baum-Welch method for finding a locally maximized probability for an observation sequence with a chosen model. In essence, it can be used to find unknown parameters. (Note: You should read up on that as well).

Types of HMMs
Taken from author's paper. Also, not Figure 7.
The basic problem explanations and walkthroughs that Rabiner provides are based on a fully connected HMM. Also known as ergodic, such HMMs allow for any state to reach any other in finite number of steps. There are, however, different types of HMMs that may be encountered. Rabiner discusses the three HMMs as shown to the right.

With Rabiner's focus on speech recognition, he now brings up the issue of monitoring continuous signals as opposed to discrete symbols that are measured at set intervals or times. With continuous observations, the probability density function must be restricted to insure that parameters can be reestimated consistently. (Note: This process is best left for your own reading).

Rabiner's look at continuous signals in speech recognition is relevant for BCI research. When sensor data is being generated continuously (and from 14 points if you're using the EPOC), something has to be done to make sense of the information in polynomial time. In particular, a researcher may want to know what signal sequence led up to an observed state, or how best to classify a set of signals from various sensors. HMM-based classification has previously been researched and you can find different publications on the subject by performing a simple online search (a few examples being found here and here). As for Rabiner's paper itself, I stopped summarizing around page 12 of 30 because I needed to read it through a couple times before I understood anything beyond that point. A mathematically rigorous paper is the foundation of solid research projects in many fields, but give me an applications paper any day. As is evident from my summary, I identified more with Rabiner's examples than with his equations, and as such I basically avoided their reproduction here in this entry. Also, I found that the Wikipedia entry on HMMs was beneficial for getting more examples and quickly examining the key problems as stated by Rabiner.

Full Reference:
Rabiner, L.R.: A tutorial on Hidden Markov Models and selected applications in speech recognition. Proceedings of the IEEE (1989).

No comments:

Post a Comment