String Probabilities
- How do you determine if a pattern found in a text is significant?
- One method is to determine the probability of the pattern occuring if the text is random.
- However, these probabilities are not always obvious.
- Consider the probability of the patterns AA and AT in (uniform i.i.d.) strings over the alphabet A,T of various length:
- In strings of length 2, both AA and AT occur with probability 1/4:
- In strings of length 3, however, AA occurs with probability 3/8 and AT occurs with probability 4/8:
| AA | | AT
|
| AAA | TAA | | AAA | TAA
|
| ATA | TTA | | ATA | TTA
|
| ATT | TTT | | ATT | TTT
|
| AAT | TAT | | AAT | TAT |
- In texts of length 4, however, AA occurs with probability 8/16 and AT occurs with probability 11/16.
| AA | | AT
|
| AAAA | ATAA | TAAA | TTAA | | AAAA | ATAA | TAAA | TTAA
|
| AATA | ATTA | TATA | TTTA | | AATA | ATTA | TATA | TTTA
|
| AATT | ATTT | TATT | TTTT | | AATT | ATTT | TATT | TTTT
|
| AAAT | ATAT | TAAT | TTAT | | AAAT | ATAT | TAAT | TTAT
|