Finding the probability of an observed sequence
1. Exhaustive search for solution
We want to find the probability of an observed sequence given an HMM - that is, the parameters (
,A,B) are known. Consider the weather example; we have a HMM describing the weather and its relation to the state of the seaweed, and we also have a sequence of seaweed observations. Suppose the observations for 3 consecutive days are (dry,damp,soggy) - on each of these days, the weather may have been sunny, cloudy or rainy. We can picture the observations and the possible hidden states as a trellis.

It can be seen that one method of calculating the probability of the observed sequence would be to find each possible sequence of the hidden states, and sum these probabilities. For the above example, there would be 3^3=27 possible different weather sequences, and so the probability is 数据挖掘实验室
Pr(dry,damp,soggy | HMM) = Pr(dry,damp,soggy | sunny,sunny,sunny) + Pr(dry,damp,soggy | sunny,sunny ,cloudy) + Pr(dry,damp,soggy | sunny,sunny ,rainy) + . . . . Pr(dry,damp,soggy | rainy,rainy ,rainy)
Calculating the probability in this manner is computationally expensive, particularly with large models or long sequences, and we find that we can use the time invariance of the probabilities to reduce the complexity of the problem. 数据挖掘实验室
2. Reduction of complexity using recursion
We will consider calculating the probability of observing a sequence recursively given a HMM. We will first define a partial probability, which is the probability of reaching an intermediate state in the trellis. We then show how these partial probabilities are calculated at times t=1 and t=n (> 1). 数据挖掘研究院
Suppose throughout that the T-long observed sequence is 数据挖掘研究院

2a. Partial probabilities, (
′s)
Consider the trellis below showing the states and first-order transitions for the observation sequence dry,damp,soggy;

We can calculate the probability of reaching an intermediate state in the trellis as the sum of all possible paths to that state.For example, the probability of it being cloudy at t = 2 is calculated from the paths; 数据挖掘研究院
We denote the partial probability of state j at time t ast ( j ) - this partial probability is calculated as;
t ( j )= Pr( observation | hidden state is j ) x Pr(all paths to state j at time t)
The partial probabilities for the final observation hold the probability of reaching those states going through all possible paths - e.g., for the above trellis, the final partial probabilities are calculated from the paths :
It follows that the sum of these final partial probabilities is the sum of all possible paths through the trellis, and hence is the probability of observing the sequence given the HMM.Section 3 introduces an animated example of the calculation of the probabilities.
2b. Calculating
′s at time t = 1
We calculate partial probabilities as :
t ( j )= Pr( observation | hidden state is j ) x Pr(all paths to state j at time t) 数据挖掘研究院
In the special case where t = 1, there are no paths to the state. The probability of being in a state at t = 1 is therefore the initial probability, i.e. Pr( state | t = 1) =
(state), and we therefore calculate partial probabilities at t = 1 as this probability multiplied by the associated observation probability; 数据挖掘研究院

Thus the probability of being in state j at intialisation is dependent on that state′s probability together with the probability of observing what we see at that time.
2c. Calculating
′s at time, t (> 1)
We recall that a partial probability is calculated as :
t ( j )= Pr( observation | hidden state is j ) x Pr(all paths to state j at time t)
We can assume (recursively) that the first term of the product is available, and now consider the term Pr(all paths to state j at time t).
To calculate the probability of getting to a state through all paths, we can calculate the probability of each path to that state and sum them - for example,

The number of paths needed to calculateincreases exponentially as the length of the observation sequence increases but the
′s at time t-1 give the probability of reaching that state through all previous paths, and we can therefore define
′s at time t in terms of those at time t-1 -i.e.,
Thus we calculate the probabilities as the product of the appropriate observation probability (that is, that state j provoked what is actually seen at time t+1) with the sum of probabilities of reaching that state at that time - this latter comes from the transition probabilities together with a from the preceding stage. 数据挖掘实验室
Notice that we have an expression to calculateat time t+1 using only the partial probabilities at time t.
We can now calculate the probability of an observation sequence given a HMM recursively - i.e. we use
′s at t=1 to calculate
′s at t=2;
′s at t=2 to calculate
′s at t=3; and so on until t = T. The probability of the sequence given the HMM is then the sum of the partial probabilities at time t = T 数据挖掘实验室
2d. Reduction of computational complexity
We can compare the computational complexity of calculating the probability of an observation sequence by exhaustive evaluation and by the recursive forward algorithm.We have a sequence of T observations, O. We also have a Hidden Markov Model, l=(
,A,B), with n hidden states. 数据挖掘实验室
An exhaustive evaluation would involve computing for all possible execution sequences
the quantity
which sums the probability of observing what we do - note that the load here is exponential in T. Conversely, using the forward algorithm we can exploit knowledge of the previous time step to compute information about a new one - accordingly, the load will only be linear in T.
3. Summary
Our aim is to find the probability of a sequence of observations given a HMM - (Pr (observations |
).
We reduce the complexity of calculating this probability by first calculating partial probabilities (
′s). These represent the probability of getting to a particular state, s, at time t. 数据挖掘研究院
We then see that at time t = 1, the partial probabilities are calculated using the initial probabilities (from the
vector) and Pr(observation | state) (from the confusion matrix); also, the partial probabilities at time t (> 1) can be calculated using the partial probabilities at time t-1.
This definition of the problem is recursive, and the probability of the observation sequence is found by calculating the partial probabilities at time t = 1, 2, ..., T, and adding all
′s at t = T.
Notice that computing the probability in this way is far less expensive than calculating the probabilities for all sequences and adding them. 数据挖掘研究院




![[Formula]](http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/html_dev/forward_algorithm/graphics/5.1.2.3_1.gif)
![[Formula]](http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/html_dev/forward_algorithm/graphics/5.1.2.4_1.gif)
![[Formula]](http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/html_dev/forward_algorithm/graphics/5.1.2.4_2.gif)