Thursday, May 26, 2011

Paper Summary - A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition

Summary:
This summary also functions as a very rough and basic introduction to HMMs for myself. This accounts for the different format and subsequent length of this entry.


Overview
In this paper, Rabiner provides a tutorial on hidden Markov models and how they may be applied to the area of speech recognition. A hidden Markov model (HMM) is a stochastic signal model, meaning that the signals can be well characterized as parametric random processes. Rabiner provides a formal look at discrete Markov processes with the example of predicting the weather for a week. The example is roughly as follows:

Taken from author's paper.
Any day it may be raining, cloudy, or sunny (so 3 possible states). For a given day t, the matrix of state transition probabilities is as shown to the right. Given that it is sunny (state 3) on the first day, we define an observation sequence O as O ={S3, S3, S3, S1, S1, S3, S2, S3}, and wish to determine its probability. As Rabiner states in his paper, the following expression and evaluation shows the probability for this observation sequence:
Taken from author's paper. Determining the probability for the weather.

In this example the states are observable, physical events. Hidden processes, however, often exist in problems that by definition lack observable processes. In such instances the result of a process can be known but not the process itself. Therefore the problem becomes finding an accurate model for the observed sequence of results. This is where building an HMM to explain the observed sequence comes into play. See Rabiner's paper for two initial examples that can be modeled with an HMM.

Three Basic Problems for HMMs
There are three basic problems that must be solved for an HMM to be of use.

Problem 1 - Given a sequence of observations and a model, how do you compute the probability that the model produced the observed sequence? An efficient procedure for solving this problem is known as the forward-backward procedure. This procedure is based on the lattice structure of states. Given N states, each change from one state to another will again result in one of these N states. Therefore, calculating the probability of a partial observation sequence up to a given time can be somewhat reduced to calculating the probabilities along the connections between states for a given time. (Note: You are much better off reading up on the procedure than trusting my summary).

Problem 2 - Given a sequence of observations and a model, how do you get a state sequence which is optimal in a meaningful way? In most instances, you may be interested in finding the single best state sequence. The Viterbi Algorithm uses dynamic programming to do so. Again, a lattice structure effectively illustrates the process. (Note: Read up on the process because I'm not rehashing it here).

Problem 3 - How do you optimize model parameters to best describe a given sequence of observations? Rabiner states that this is the most difficult problem of HMMs. He focuses on the Baum-Welch method for finding a locally maximized probability for an observation sequence with a chosen model. In essence, it can be used to find unknown parameters. (Note: You should read up on that as well).

Types of HMMs
Taken from author's paper. Also, not Figure 7.
The basic problem explanations and walkthroughs that Rabiner provides are based on a fully connected HMM. Also known as ergodic, such HMMs allow for any state to reach any other in finite number of steps. There are, however, different types of HMMs that may be encountered. Rabiner discusses the three HMMs as shown to the right.

With Rabiner's focus on speech recognition, he now brings up the issue of monitoring continuous signals as opposed to discrete symbols that are measured at set intervals or times. With continuous observations, the probability density function must be restricted to insure that parameters can be reestimated consistently. (Note: This process is best left for your own reading).


Discussion:
Rabiner's look at continuous signals in speech recognition is relevant for BCI research. When sensor data is being generated continuously (and from 14 points if you're using the EPOC), something has to be done to make sense of the information in polynomial time. In particular, a researcher may want to know what signal sequence led up to an observed state, or how best to classify a set of signals from various sensors. HMM-based classification has previously been researched and you can find different publications on the subject by performing a simple online search (a few examples being found here and here). As for Rabiner's paper itself, I stopped summarizing around page 12 of 30 because I needed to read it through a couple times before I understood anything beyond that point. A mathematically rigorous paper is the foundation of solid research projects in many fields, but give me an applications paper any day. As is evident from my summary, I identified more with Rabiner's examples than with his equations, and as such I basically avoided their reproduction here in this entry. Also, I found that the Wikipedia entry on HMMs was beneficial for getting more examples and quickly examining the key problems as stated by Rabiner.

Full Reference:
Rabiner, L.R.: A tutorial on Hidden Markov Models and selected applications in speech recognition. Proceedings of the IEEE (1989).

Monday, May 23, 2011

Paper Summary - The Emotional Economy for the Augmented Human

Notable Quote:
...new Commercial-On-The-Shelf (COTS) Brain-Computer Interfaces (BCI) can be used to provide real-time happiness feedback as people live their life.
Summary:
This paper by Jean-Marc Seigneur investigates the obtainment and use of real-time happiness measures. Previous and current methods for evaluating happiness are done a posteriori, which allows room for subjects to change their opinions and be influenced by current emotions while being expected to relive the past events in question. Using a BCI the author proposes detecting happiness in the moment, and at a more basic level. Seigneur describes this as the Emotional Economy (EE) Model. Given a user and a service, the model looks at what emotions reach a threshold to be measured (the example here being happiness). This information can then be used to make decisions or catalog emotions at desired checkpoints. The proof-of-concepts afford measures of engagement, frustration, meditation, instantaneous excitement, long term excitement, and happiness. Note that these measures are easily accommodated with the Emotiv EPOC headset used.

The Facebook scenario. Taken from the author's paper.
As an example, the author created two use-scenarios. In the first, a user watches a video via Facebook, and if Happiness is detected, then the video is automatically liked.The second scenario incorporates location-based computing. The wearer is given a backpack containing a portable computer and a GPS unit. While moving along an outdoor tour, the GPS and emotion readings can be synched via their timestamps to determine at which geographic point the wearer felt different emotions, and thus if they enjoyed themselves and when. As stated by the author, this scenario could allow for automatic tourist reviews and, when combined with written testimonials, lead credence to factual positive and negative reviews overall.

Discussion:
I highly recommend this short paper to those interested in ubiquitous computing and/or emotional measures. The thing that really sticks out is that as BCIs become more functional and effective at a higher lever they can become just another input device or sensor, albeit one with a wider gambit of potential uses. In essence, the author is stating this in his Facebook scenario, the code needed was for interacting with Facebook and not for manipulating the headset sensors themselves. This work is also open to quick extensions. For instance, measuring frustration could be an indicator of not liking the video in the Facebook scenario, which could cause it to be removed the viewer's news feed. The second scenario opens up a wider range of applications via the incorporation of positioning and timestamps. Such a scenario could be extended for testing amusement parks and rides, and movies or commercials without the need for the GPS data. In fact, a similar article can be found here on engadget.

Outlook:
Very engaging. For some reason the extension I thought of was relationship evaluation. Are you really happy with the person you're with? Who makes you happier? That's all an online dating site needs are brain signals to further prove connections between paying customers romantic matches. Or therapists for that matter! Pop on a headset and really see what you feel given stimuli. With the potential perfection of emotional classification in the future, who knows what can be learned about others. This raises an ethical concern as well. If I CAN see how everything makes you feel, SHOULD I? Could BCIs be used to weed out unfit Soldiers or track down criminals? Could they be used to detect biases, fallacies, and overall ethnocentric beliefs? To me that is very heavy stuff worth much further consideration.

Full Reference:
Jean-Marc Seigneur. 2011. The emotional economy for the augmented human. In Proceedings of the 2nd Augmented Human International Conference (AH '11). ACM, New York, NY, USA, , Article 24 , 4 pages. DOI=10.1145/1959826.1959850 http://doi.acm.org/10.1145/1959826.1959850

Paper Summary - NeuroPhone: brain-mobile phone interface using a wireless EEG headset

Notable Quote:
...users on-the-go can simply “think” their way through all of their mobile applications.

Summary:
NeuroPhone is a system which essentially allows people to dial contacts by thinking about them. Campbell, et al. use the Emotiv EPOC to read EEG signals and pass them to an iPhone. The iPhone then runs a lightweight classifier that can distinguish the desired signals from noise. More specifically, the authors use P300 signals (A positive peak associated with a desired individual image within a set) and physical winks. The process is as follows:
  1. A set of contact photos appear on the screen
  2. Each photo is highlighted in turn 
  3. The user concentrates on the photo for the contact they wish to dial
  4. When said photo is highlighted, a P300 signal is generated (or the user winks)
  5. The iPhone gets the positive acknowledgement from the wearer and dials that contact
    The contact selection process. Graciously taken from the authors' paper.
The authors make note of design considerations based on the implementation of NeuroPhone. First, noise is an issue with both the EPOC and EEG signals. To reduce noise, they propose averaging data over many trials, thus increasing the signal to noise ratio at the expense of increased delay time. They also use a filter to remove any noise outside of the desired P300 frequency range. Finally, designing a mobile-based classifier requires efficient design choices. In the instance of NeuroPhone only a subset f the EEG channels are passed to the iPhone for classification, and there is no continuous streaming of data to the device (think battery drain).

The authors conducted an initial user study for both the 'wink' and 'think' selection modes with 3 subjects. They found that their classifier for winks worked best on relaxed, seated subjects. Actions that led to muscle contraction and distracted users (music was used in their test) led to significantly lower accuracy measures. The authors also showed that accuracy increases as the data accumulation time increases.

A video overview of NeuroPhone is also available.

Discussion:
When reading the evaluation I couldn't help but wonder if the users liked the application itself. Besides its unquestionable novelty, does it serve a function? Neither I nor the authors claim that NeuroPhone was designed to solve the specific problem of calling someone, but I believe the paper could have done a better job stressing the function of NeuroPhone in the greater realm of mobile applications. This project serves as an excellent example of how signal processing and BCI research can be meshed into HCI and mobile and ubiquitous computing. I have read previous papers on P300-spellers (typing letters by selecting them from a visible grid) and was glad to see the concept extended in a way that now seems completely obvious. I also enjoy the fact that they used the Emotiv EPOC because it is obviously my chosen headset for my fledgling research. The authors did a great job of explaining the benefits of using cheap(er) headsets like the EPOC and framing their use within the context of mobile computing. Overall the focus of the paper seemed to be torn between the success of the mobile classifier and the contact dialer itself. Both points came across, but I read sections out of order to better follow each thread of their contribution. Great stuff.

Outlook:
Beyond the project itself, I really connected with section 2 of the paper. The future outlook posed by the authors literally made me stop and consider the implications of BCI research. How long until we have to worry about people intercepting our 'thoughts' or emotional maps, or forging them in order to interface with technologies? I feel like Tom Clancy wrote something about this already... Regardless, the authors pose an excellent point. Emotion-driven interfaces are on their way, and we have much to consider.

Full Reference:
Andrew Campbell, Tanzeem Choudhury, Shaohan Hu, Hong Lu, Matthew K. Mukerjee, Mashfiqui Rabbi, and Rajeev D.S. Raizada. 2010. NeuroPhone: brain-mobile phone interface using a wireless EEG headset. In Proceedings of the second ACM SIGCOMM workshop on Networking, systems, and applications on mobile handhelds (MobiHeld '10). ACM, New York, NY, USA, 3-8. DOI=10.1145/1851322.1851326 http://doi.acm.org/10.1145/1851322.1851326

Tuesday, May 17, 2011

The Emotiv EPOC - Introduction

Overview:
I am using the Emotiv EPOC neuroheadset with the Education Edition SDK. The headset and this SDK run at $2500 together. I recommend this version if you are conducting research that is a) non-commercial and (optionally) b) open to multiple collaborators within your department. And you will also need to be at some form of academic or education institute. Emotiv offers 6 different SDKs to choose from, including the FREE SDKLite. Read up and choose the one that is right for you or your organization.

Why the Emotiv EPOC?
I chose to use the EPOC for my research for a few different reasons. First, it has 14 electrodes and can also measure head rotation. The more you can measure, the more you can do with the signals (well hopefully). The EPOC can be trained to detect conscious thoughts, emotions, and facial expressions. The provided Control Panel is great for viewing sensor contacts, doing some basic training, and just getting acquainted with the headset. The first time you put on the headset and hook it up to the computer, you will feel awesome. Or at least I felt awesome.
The white outer box is essentially a layer of paper.
Electroencephalographic (EEG) signals are fluctuations of electrical potential along the scalp created by neurons in the brain. EEG signals can thus be measured outside the brain itself in a non-invasive way via wearable headsets.The EPOC is one such headset. A very brief comparison of current consumer headsets is available on Wikipedia. More information on BCIs and how EEG fits within the field can also be found on Wikipedia.

Contents unpacked.

Out of the box
The EPOC comes in a flimsy box that in no way made me feel comfortable with my choice. Upon opening, however, you will find that everything is securely packed. My box contained a headset, a small bottle of solution, a case containing the 16 sensors, a charger, and the Bluetooth connector. Additional sensor packs and headsets can be purchased from the Emotiv Store.

Obtaining the SDK
The software that is available through the Education Edition DOES NOT come in the box. It is available for download, linked to the purchaser's email address, from the Emotiv website. This can be a slight pain in the ass to obtain, but I can say from experience that the Customer Care personnel are both patient and timely in their responses. In my case it took 5 days of correspondence (and a lot of confusion) before I was given access to the Education Edition SDK. If you have a departmental purchasing officer, please do the following to avoid any issues:
  • Don't Panic.
  • Have the purchaser forward you all emails received from Emotiv about the purchase made. This includes order number (which is actually order ID) and confirmations.
  • When you register on the Emotiv site, use your departmental email address. They will check that the emails between purchaser and researcher match (i.e., department.school.edu). Again, the SDK is licensed to your department and not your entire university.
  • Do not take everything they say literally. For example, if they ask for the order number, it might not be the thing labeled 'order number' Or if they ask for your school's email ID, they really want your department's email ID that will match the purchaser.
The person who helped me was awesome, and I naively assume that everyone in Emotiv's employ is equally awesome.

First Things First!
Make sure that your headset turns on before you do anything. If it is not holding a slight charge, then simply plug it in and give it a few minutes. Red light is charging, green light is charged. If you get a blue light when you flip the switch (located on the bottom rear of the headband), then you are set. Once you know the headset turns on, read the digital manual and begin finding that perfectly frustrating level of solution needed for proper conductivity.