Waiting Till a Pattern Appears

Waiting Till a Pattern Appears¶

You already know how to do this for short patterns, but Markov Chain methods will help you organize and extend your calculations. Here we will set up a way of using Markov Chains to find the expected waiting time till a particular pattern appears in a sequence of i.i.d. trials. The method is based on conditioning on the first move of the chain, so we have been calling it "conditioning on the first move." In Markov Chain terminology, the method is called "first step analysis."

Waiting Till HTH¶

Suppose you toss a coin that lands heads with probability $p$. Lazily, we will call it a $p$-coin. What is the expected number of tosses till you see the pattern HTH? You know how to do this by conditioning on the first few moves. Let's do that calculation in the language of Markov Chains, and you will see how to generalize it to any pattern.

Imagine the sequence of heads and tails, and at time $n \ge 1$, let $X_n$ keep track of where in the pattern the chain is at time $n$. That means:

• Define $\mathcal{S}$ to be the "starting state" where the chain is not in the pattern at all. So $X_0 = \mathcal{S}$ with probability 1. For $n \ge 1$, if the $n$th toss is T, look back at the $(n-1)$th toss. If that was also T, or if there wasn't an $(n-1)$th toss because $n=1$, then at time $n$ the chain is not in the pattern at all, so $X_n = \mathcal{S}$.
• If the $n$th toss is $H$, then check the previous two tosses; if those were HT then the wait is over, which we will define to be a different state below. If the previous two tosses weren't HT then at time $n$ the chain is one character into the pattern. We will say $X_n = H$.
• If the $n$th toss is T and the previous toss was H, the chain is two characters into the pattern and we define $X_n = HT$.
• If the $n$th toss is H and the previous two tosses were HT then we define $X_n = HTH$ and the wait is over.

Then $X_0, X_1, X_2, \ldots$ is a Markov Chain (you should check this) with the following transition matrix.

$\mathbf{\mathcal{S}}$ H HT HTH
$\mathbf{\mathcal{S}}$ $q$ $p$ 0 0
H 0 $p$ $q$ 0
HT $q$ 0 0 $p$
HTH 0 0 0 1

Our goal is to find $$\mu(\mathcal{S}) = E(W_{HTH} \mid X_0 = \mathcal{S})$$

We will do this by a first step analysis, for which we will need some notation. Define:

• $\mu(H) = E(W_{HTH} \mid X_1 = H)$
• $\mu(HT) = E(W_{HTH} \mid X_1 = H, X_2 = T)$

Then we get a system of equations.

\begin{align*} \mu(\mathcal{S}) &= q(1 + \mu(\mathcal{S})) + p(1 + \mu(H))\\ \mu(H) &= p(1 + \mu(H)) + q(1 + \mu(HT)) \\ \mu(HT) &= p + q(1 + \mu(\mathcal{S})) \end{align*}

That's three equations in three unknowns. Let's simplify each of them. Notice that each of the simplified equations says that the chain has to go 1 more step, and then there is some additional amount of time to account for. The accounting is done by weighting those additional expected times according to which state the chain entered at the first step.

\begin{align*} \mu(\mathcal{S}) &= 1 + q\mu(S) + p\mu(H)\\ \mu(H) &= 1 + p\mu(H) + q\mu(HT) \\ \mu(HT) &= 1 + q\mu(\mathcal{S}) \end{align*}

To solve such systems of equations it is usually simplest to write each of the unknowns in terms of $\mu(\mathcal{S})$ which is the one we want to find. One standard way is to start with the first equation and write $\mu(H)$ in terms of $\mu(\mathcal{S})$, then plug that into the next equation, and so on:

$$\mu(H) = \mu(\mathcal{S}) - \frac{1}{p}$$

and so $$\mu(HT) = \mu(H) - \frac{1}{q} = \mu(\mathcal{S}) - \frac{1}{p} - \frac{1}{q}$$

The third equation gives another expression for $\mu(HT)$, so equate the two:

$$\mu(\mathcal{S}) - \frac{1}{p} - \frac{1}{q} = 1 + q\mu(\mathcal{S})$$

So $$p\mu(\mathcal{S}) = 1 + \frac{1}{p} + \frac{1}{q}$$ and finally $$\mu(\mathcal{S}) = \frac{1}{p} + \frac{1}{p^2} + \frac{1}{pq}$$

For $p = 1/2$, this works out to 10. The expected number of tosses of a fair coin till you get HTH is 10.

It makes sense that the answer is bigger than $1/p$, the expected waiting time till the first H. If you are so inclined, you can double check the answer by conditioning on $W_H$. You will find yourself redoing much of the work we have done here.

Notice that the answer can be written as $$\mu(\mathcal{S}) = \frac{1}{p} + \frac{1}{p^2} + \frac{1}{pq} = \frac{1}{p} + \frac{1}{pqp}$$

That's tantalizing, as you can recognize both the pieces and will be wondering if there is a quick way to see how they fit together. There is, by a gambling interpretation involving the use of a process called a martingale. It's beautiful stuff but a bit of detour for us here. We'll talk a bit more about the answer at the end of this section and hope that it will inspire you to learn more about stochastic processes.

You can see that in principle this method can be extended to a pattern of any length from any alphabet from which you are making i.i.d. draws. Here is another example.

Professor Typing GAGA¶

Probability text books contain many examples in which monkeys are described as doing mindless repetitive tasks without learning from past results. No self-respecting monkey would do that. Even machines learn these days. In a different take on this setting, let's just pretend that a professor is having a really bad time and is spending it hitting keys of a typewriter independently at random. Assume that the keys only include the 26 upper case letters. How many times do you expect the professor to hit keys until the sequence GAGA appears?

We will follow the process we developed in the previous example and use a Markov Chain whose states look back at the past few hits and keep track of the maximal progress made towards the pattern GAGA. So the states are $\mathcal{S}$, G (this means it was not preceded by GA), GA, GAG, and GAGA. The transition probabilities are:

$\mathbf{\mathcal{S}}$ G GA GAG GAGA
$\mathcal{S}$ 25/26 1/26 0 0 0
G 24/26 1/26 1/26 0 0
GA 25/26 0 0 1/26 0
GAG 24/26 1/26 0 0 1/26
GAGA 0 0 0 0 1

The equations are:

\begin{align*} \mu(\mathcal{S}) &= 1 + \frac{25}{26}\mu(\mathcal{S}) + \frac{1}{26}\mu(G) \\ \mu(G) &= 1 + \frac{24}{26}\mu(\mathcal{S}) + \frac{1}{26}\mu(G) + \frac{1}{26}\mu(GA) \\ \mu(GA) &= 1 + \frac{25}{26}\mu(\mathcal{S}) + \frac{1}{26}\mu(GAG) \\ \mu(GAG) &= 1 + \frac{24}{26}\mu(\mathcal{S}) + \frac{1}{26}\mu(G) \end{align*}

Multiply all the equations by 26 and start writing all the unknowns in terms of $\mu(\mathcal{S})$.

\begin{align*} \mu(G) &= \mu(\mathcal{S}) - 26 \\ \mu(GA) &= 25\mu(G) - 24\mu(\mathcal{S}) - 26 = \mu(\mathcal{S}) - (25\times26) - 26 \\ \mu(GAG) &= 26\mu(GA) - 25\mu(\mathcal{S}) - 26 = \mu(\mathcal{S}) - (25\times26\times26) - (26\times26) - 26\\ 26\mu(GAG) &= 26 + 24\mu(\mathcal{S}) + \mu(G) = 25\mu(\mathcal{S}) \end{align*}

Finally, multiply the third equation above by 26 and subtract the fourth. You will get

\begin{align*} \mu(\mathcal{S}) &= (25\times26\times26\times26) + (26\times26\times26) + (26\times26) \\ &= (26\times26\times26\times26) + (26\times26) \\ &= 26^4 + 26^2 \\ &= 457652 \end{align*}
26**4 + 26**2

457652

To get a sense of the amount of time involved, suppose the professor hits keys at the rate of 180 per minute. Then, assuming no rest breaks or slowing down, you expect the professor to keep going for almost two days before the pattern GAGA appears.

(26**4 + 26**2)/(24*60*180)

1.7656327160493828

In the GAGA example as in the HTH example, you can see that the answer is a sum of terms which are the inverses of:

• the chance of the whole pattern
• the chance of the segment which occurs both at the start and the end of the pattern.

Of course there might be more than one segment that you can see at the start as well as at the end. For example, if the pattern is GAGAGA, then GAGA is the longest segment that repeats in this way, but GA repeats as well. The expected time till GAGAGA turns out to be $26^6 + 26^4 + 26^2$. That's the sum of the inverses of the probabilities of the whole pattern and of all the segments that appear both at the start and at the end. If you are intrepid you can check this by setting up seven-state Markov Chain following the method developed in this section.

But the Markov Chain method doesn't provide insight into why the answers are the way they are. Maybe by the end of the term we will have included a section on the elegant martingale methods that explain what's going on.

For now, go back through your exercises and do the algebra to check that in tosses of a $p$-coin, $E(W_{HH}) = \frac{1}{p^2} + \frac{1}{p}$ whereas $E(W_{HT}) = \frac{1}{pq}$. The pattern HT doesn't have the feature that the start of the pattern appears again at the end.

After this, you might be willing to accept that the expected number of times the professor has to hit the typewriter keys till the pattern ABRACADABRA appears is $26^{11} + 26^4 + 26$, by noticing that both ABRA and A reappear at the end of ABRACADABRA. That's 3,670,344,487,444,778 times.

26**11 + 26**4 + 26

3670344487444778

At the professor's rate of 180 keys per minute, that's almost 39 million years.

(26**11 + 26**4 + 26)/(365*24*60*180)

38795286.73521032

We end this section with a few observations about patterns in coin tossing. If you toss a fair coin six times, then the patterns that you expect to take the longest time to appear are HHHHHH and TTTTTT. Both have expected waiting times of $2^6 + 2^5 + 2^4 + 2^3 + 2^2 + 2 = 126$.

2**6 + 2**5 + 2**4 + 2**3 + 2**2 + 2

126

Even though all patterns of length six have the same chance if the coin is fair, the expected time till each pattern depends on how the pattern wraps around itself. The expected time till HTTTTH is $2^6 + 2 = 66$, the expected time till HHTTHH is $2^6 + 2^2 + 2 = 70$, and so on.

If the coin is unfair with $p > 1/2$, then the expected time till HHHHHH is

$$\frac{1}{p^6} + \frac{1}{p^5} + \frac{1}{p^4} + \frac{1}{p^3} + \frac{1}{p^2} + \frac{1}{p}$$

That's less than the expected time till TTTTTT, which has the same formula as above but with $q$ replacing $p$. Here are some more examples.

$$E(W_{HTTTTH}) = \frac{1}{pq^4p} + \frac{1}{p} ~~~~~~~~~~ E(W_{HTHTTT}) = \frac{1}{pqpq^3} ~~~~~~~~~~ E(W_{HHTTHH}) = \frac{1}{p^2q^2p^2} + \frac{1}{p^2} + \frac{1}{p}$$

Summary¶

For a sequence of i.i.d. trials we have developed a Markov Chain method to find the expected number of trials till any fixed finite pattern appears. The method involves solving a system of equations. This is straightforward analytically for short patterns but can be tedious otherwise. We have observed in examples that the answers have a form that depends on how the pattern wraps around itself, and that therefore the answers can simply be read off the pattern. The proof that the answers will always have this form depends on martingale methods that we may include in an optional section.