This summary also functions as a very rough and basic introduction to HMMs for myself. This accounts for the different format and subsequent length of this entry.
Overview
In this paper, Rabiner provides a tutorial on hidden Markov models and how they may be applied to the area of speech recognition. A hidden Markov model (HMM) is a stochastic signal model, meaning that the signals can be well characterized as parametric random processes. Rabiner provides a formal look at discrete Markov processes with the example of predicting the weather for a week. The example is roughly as follows:
Taken from author's paper. |
Taken from author's paper. Determining the probability for the weather. |
In this example the states are observable, physical events. Hidden processes, however, often exist in problems that by definition lack observable processes. In such instances the result of a process can be known but not the process itself. Therefore the problem becomes finding an accurate model for the observed sequence of results. This is where building an HMM to explain the observed sequence comes into play. See Rabiner's paper for two initial examples that can be modeled with an HMM.
Three Basic Problems for HMMs
There are three basic problems that must be solved for an HMM to be of use.
Problem 1 - Given a sequence of observations and a model, how do you compute the probability that the model produced the observed sequence? An efficient procedure for solving this problem is known as the forward-backward procedure. This procedure is based on the lattice structure of states. Given N states, each change from one state to another will again result in one of these N states. Therefore, calculating the probability of a partial observation sequence up to a given time can be somewhat reduced to calculating the probabilities along the connections between states for a given time. (Note: You are much better off reading up on the procedure than trusting my summary).
Problem 2 - Given a sequence of observations and a model, how do you get a state sequence which is optimal in a meaningful way? In most instances, you may be interested in finding the single best state sequence. The Viterbi Algorithm uses dynamic programming to do so. Again, a lattice structure effectively illustrates the process. (Note: Read up on the process because I'm not rehashing it here).
Problem 3 - How do you optimize model parameters to best describe a given sequence of observations? Rabiner states that this is the most difficult problem of HMMs. He focuses on the Baum-Welch method for finding a locally maximized probability for an observation sequence with a chosen model. In essence, it can be used to find unknown parameters. (Note: You should read up on that as well).
Types of HMMs
Taken from author's paper. Also, not Figure 7. |
With Rabiner's focus on speech recognition, he now brings up the issue of monitoring continuous signals as opposed to discrete symbols that are measured at set intervals or times. With continuous observations, the probability density function must be restricted to insure that parameters can be reestimated consistently. (Note: This process is best left for your own reading).
Discussion:
Rabiner's look at continuous signals in speech recognition is relevant for BCI research. When sensor data is being generated continuously (and from 14 points if you're using the EPOC), something has to be done to make sense of the information in polynomial time. In particular, a researcher may want to know what signal sequence led up to an observed state, or how best to classify a set of signals from various sensors. HMM-based classification has previously been researched and you can find different publications on the subject by performing a simple online search (a few examples being found here and here). As for Rabiner's paper itself, I stopped summarizing around page 12 of 30 because I needed to read it through a couple times before I understood anything beyond that point. A mathematically rigorous paper is the foundation of solid research projects in many fields, but give me an applications paper any day. As is evident from my summary, I identified more with Rabiner's examples than with his equations, and as such I basically avoided their reproduction here in this entry. Also, I found that the Wikipedia entry on HMMs was beneficial for getting more examples and quickly examining the key problems as stated by Rabiner.
Full Reference:
Rabiner, L.R.: A tutorial on Hidden Markov Models and selected applications in speech recognition. Proceedings of the IEEE (1989).