## Booles Inequality

### Boole's Inequality¶

Boole's Inequality provides an upper bound on the chance of a union. Let $A_1, A_2, \ldots , A_n$ be events. Then Boole's Inequality says that $$P(\bigcup_{i=1}^n A_i) \le \sum_{i=1}^n P(A_i)$$

That is, the chance that at least one of the events occurs can be no larger than the sum of the chances.

That the inequality is true for $n=2$ is visible in the figure below and can easily be proved using the inclusion-exclusion formula for $n=2$.

show_intersection()


For general $n$ the inequality can be proved by induction and is left for you as an exercise.

Because $P(\cup_{i=1}^n A_i) \ge P(A_k)$ for each $k$, we have a lower bound too:

$$P(\bigcup_{i=1}^n A_i) \ge \max\{P(A_i): 1 \le i \le n \}$$

So $$\max\{P(A_i): 1 \le i \le n \} ~ \le ~ P(\bigcup_{i=1}^n A_i) ~ \le ~ \sum_{i=1}^n P(A_i)$$

For example, if the weather forecast says that the chance of rain on Saturday is 40% and the chance of rain on Sunday is 10%, then the chance that it rains at some point during those two days is at least 40% and at most 50%.

To find the chance exactly, you would need the chance that it rains on both days, which you don't have. Assuming independence doesn't seem like a good idea in this setting. So you cannot compute an exact answer, and must be satisfied with bounds.

Though bounds aren't exact answers or even approximations, they can be very useful. Here is an example of a common use of Boole's Inequality in data science. It has Bonferroni's name attached to it, because Boole and Bonferroni both have related bounds on probabilities of unions.

### Bonferroni Method¶

Suppose you are estimating five parameters based on a random sample, and that for each parameter you have a method that produces a good estimate with any pre-specified chance. For example, if the estimate has to be good with chance 99%, you have a way of doing that.

Now suppose you want your estimates to be such that all five are good with chance 95%. What should you do?

It is not enough to make each estimate good with chance 95%. If you do that, the chance that they are all good will be less than 95%, because the event "all are good" is a subset of each event "Estimate $i$ is good."

Let $A_i$ be the event that Estimate $i$ is good. You want $P(A_1A_2A_3A_4A_5) = 0.95$. But now you are stuck, because all you can control are $P(A_1)$, $P(A_2)$, $P(A_3)$, $P(A_4)$, and $P(A_5)$ individually. You can make each of them as large as you want, but you don't know how they relate to the chance of the intersection, because the five estimates are dependent. They are all based on the same sample.

You can get yourself out of this problem by looking at the complement of the event "all five are good". The complement is "at least one is bad", which is the union of the events "Estimate $i$ is bad". Your condition is

\begin{align*} 0.05 &= P(\text{at least one estimate is bad}) \\ &= P(A_1^c \cup A_2^c \cup A_3^c \cup A_4^c \cup A_5^c ) \\ &\le P(A_1^c) + P(A_2^c) + P(A_3^c) + P(A_4^c) + P(A_5^c) ~~~ \text{by Boole's Inequality} \end{align*}

Each term in the sum is the chance that the corresponding estimate is bad. You want those chances to be small. But you also want them to be large enough so that their sum is at least 0.05, because of the calculation above.

One way is to make each of them equal to $0.05/5 = 0.01$, that is 1%. In other words, you have to construct each estimate so that it is good with chance 99% (for example, by constructing a 99% confidence interval). This is called the Bonferroni Method of simultaneous inference.

The Bonferroni method shows that if you construct each of five estimates so that it is good with chance 99%, then the chance that all five estimates are good will be at least 95%.

You can replace 95% by any other threshold and carry out the calculation again to see how good the individual estimates have to be so that they are simultaneously good with a chance that exceeds the threshold.