## Dependence and Independence

### Dependence and Independence¶

Conditional distributions help us formalize our intuitive ideas about whether two random variables are independent of each other. Let $X$ and $Y$ be two random variables, and suppose we are given the value of $X$. Does that change our opinion about $Y$? If the answer is yes, then we will say that $X$ and $Y$ are dependent. If the answer is no, no matter what the given value of $X$ is, then we will say that $X$ and $Y$ are independent.

Let's start with some examples and then move to precise definitions and results.

### Dependence¶

Here is the joint distribution of two random variables $X$ and $Y$. From this, what can we say about whether $X$ and $Y$ are dependent or independent?

dist1

X=0 X=1 X=2 X=3
Y=3 0.037037 0.000000 0.000000 0.00000
Y=2 0.166667 0.055556 0.000000 0.00000
Y=1 0.250000 0.166667 0.027778 0.00000
Y=0 0.125000 0.125000 0.041667 0.00463

You can see at once that if $X = 3$ then $Y$ can only be 0, whereas if $X = 2$ then $Y$ can be either 1 or 2. Knowing the value of $X$ changes the distribution of $Y$. That's dependence.

Here are the conditional distributions of $Y$ given each of the different values of $X$. You can see that all the distributions are different, and different also from the marginal of $Y$.

dist1.conditional_dist('Y', 'X')

Dist. of Y | X=0 Dist. of Y | X=1 Dist. of Y | X=2 Dist. of Y | X=3 Marginal of Y
Y=3 0.064 0.00 0.0 0.0 0.037037
Y=2 0.288 0.16 0.0 0.0 0.222222
Y=1 0.432 0.48 0.4 0.0 0.444444
Y=0 0.216 0.36 0.6 1.0 0.296296
Sum 1.000 1.00 1.0 1.0 1.000000

Here is an example in which you can't quickly determine dependence or independence by just looking at the possible values.

dist2

X=3 X=4
Y=7 0.3 0.1
Y=6 0.2 0.2
Y=5 0.1 0.1

But you can tell by looking at the conditional distributions of $Y$ given $X$. They are different.

dist2.conditional_dist('Y', 'X')

Dist. of Y | X=3 Dist. of Y | X=4 Marginal of Y
Y=7 0.500000 0.25 0.4
Y=6 0.333333 0.50 0.4
Y=5 0.166667 0.25 0.2
Sum 1.000000 1.00 1.0

It follows (and you should try to prove this), that at least some of the conditional distributions of $X$ given the different values of $Y$ will also be different from each other and from the marginal of $X$.

Notice that not all the conditional distributions are different. The conditional distribution of $X$ given $Y=5$ is the same as the conditional distribution of $X$ given $Y=6$. But given $Y=7$, the conditional distribution changes. $X$ and $Y$ are dependent.

dist2.conditional_dist('X', 'Y')

X=3 X=4 Sum
Dist. of X | Y=7 0.75 0.25 1.0
Dist. of X | Y=6 0.50 0.50 1.0
Dist. of X | Y=5 0.50 0.50 1.0
Marginal of X 0.60 0.40 1.0

### Conditionals, Not Marginals¶

Often, the nature of the randomization in an experiment helps us identify dependence without calculation. For example, suppose you make two draws at random without replacement from the set $\{1, 2, 3, 4\}$. Let $X_1$ be the number on the first draw and $X_2$ the number on the second. Then $X_1$ and $X_2$ are dependent: for example, knowing that $X_1=2$ prevents $X_2$ from being equal to 2.

Here is the joint distribution of $X_1$ and $X_2$ along with the marginals. Each of the 12 possible pairs is equally likely.

val_X1 = [1, 2, 3, 4]
val_X2 = [1, 2, 3, 4]
def prob_wor(i, j):
if i != j:
return 1/12
else:
return 0
t_wor = Table().values('X1', val_X1, 'X2', val_X2).probability_function(prob_wor)
dist3 = t_wor.toJoint()

dist3.both_marginals()

X1=1 X1=2 X1=3 X1=4 Sum: Marginal of X2
X2=4 0.083333 0.083333 0.083333 0.000000 0.25
X2=3 0.083333 0.083333 0.000000 0.083333 0.25
X2=2 0.083333 0.000000 0.083333 0.083333 0.25
X2=1 0.000000 0.083333 0.083333 0.083333 0.25
Sum: Marginal of X1 0.250000 0.250000 0.250000 0.250000 1.00

It is important to notice that both the marginals are the same. $X1$ has the uniform distribution on $\{1, 2, 3, 4\}$, and so does $X_2$.

In a later chapter we will study these symmetries in sampling without replacement. For now, note that you can't spot dependence by just looking at the marginals. Dependence is visible in the conditionals. Given $X_1$, you can see that $X_2$ is equally likely to be any of the other three values. In the other direction, given $X_2$, you can see that $X_1$ is equally likely to be any of the other three values.

dist3.conditional_dist('X2', 'X1')

Dist. of X2 | X1=1 Dist. of X2 | X1=2 Dist. of X2 | X1=3 Dist. of X2 | X1=4 Marginal of X2
X2=4 0.333333 0.333333 0.333333 0.000000 0.25
X2=3 0.333333 0.333333 0.000000 0.333333 0.25
X2=2 0.333333 0.000000 0.333333 0.333333 0.25
X2=1 0.000000 0.333333 0.333333 0.333333 0.25
Sum 1.000000 1.000000 1.000000 1.000000 1.00
dist3.conditional_dist('X1', 'X2')

X1=1 X1=2 X1=3 X1=4 Sum
Dist. of X1 | X2=4 0.333333 0.333333 0.333333 0.000000 1.0
Dist. of X1 | X2=3 0.333333 0.333333 0.000000 0.333333 1.0
Dist. of X1 | X2=2 0.333333 0.000000 0.333333 0.333333 1.0
Dist. of X1 | X2=1 0.000000 0.333333 0.333333 0.333333 1.0
Marginal of X1 0.250000 0.250000 0.250000 0.250000 1.0

### Independence¶

If we had instead drawn the two numbers with replacement, then all 16 pairs would have been equally likely, and knowing the value of one of the variables would not affect your opinion about the other.

def prob_wr(i, j):
return 1/16
t_wr = Table().values('X1', val_X1, 'X2', val_X2).probability_function(prob_wr)
dist4 = t_wr.toJoint()

dist4

X1=1 X1=2 X1=3 X1=4
X2=4 0.0625 0.0625 0.0625 0.0625
X2=3 0.0625 0.0625 0.0625 0.0625
X2=2 0.0625 0.0625 0.0625 0.0625
X2=1 0.0625 0.0625 0.0625 0.0625
dist4.conditional_dist('X2', 'X1')

Dist. of X2 | X1=1 Dist. of X2 | X1=2 Dist. of X2 | X1=3 Dist. of X2 | X1=4 Marginal of X2
X2=4 0.25 0.25 0.25 0.25 0.25
X2=3 0.25 0.25 0.25 0.25 0.25
X2=2 0.25 0.25 0.25 0.25 0.25
X2=1 0.25 0.25 0.25 0.25 0.25
Sum 1.00 1.00 1.00 1.00 1.00
dist4.conditional_dist('X1', 'X2')

X1=1 X1=2 X1=3 X1=4 Sum
Dist. of X1 | X2=4 0.25 0.25 0.25 0.25 1.0
Dist. of X1 | X2=3 0.25 0.25 0.25 0.25 1.0
Dist. of X1 | X2=2 0.25 0.25 0.25 0.25 1.0
Dist. of X1 | X2=1 0.25 0.25 0.25 0.25 1.0
Marginal of X1 0.25 0.25 0.25 0.25 1.0

What you are seeing in these rather monotonous tables is independence. It doesn't matter what the value of $X_1$ is; conditional on that value, $X_2$ is still uniform on all four values. And no matter what $X_2$ is, conditional on that value $X_1$ is still uniform of all four values.

Here is a joint distribution table in which independence becomes apparent once you condition.

dist5

X=0 X=1 X=2 X=3
Y=4 0.000096 0.000289 0.000289 0.000096
Y=3 0.001929 0.005787 0.005787 0.001929
Y=2 0.014468 0.043403 0.043403 0.014468
Y=1 0.048225 0.144676 0.144676 0.048225
Y=0 0.060282 0.180845 0.180845 0.060282
dist5.conditional_dist('Y', 'X')

Dist. of Y | X=0 Dist. of Y | X=1 Dist. of Y | X=2 Dist. of Y | X=3 Marginal of Y
Y=4 0.000772 0.000772 0.000772 0.000772 0.000772
Y=3 0.015432 0.015432 0.015432 0.015432 0.015432
Y=2 0.115741 0.115741 0.115741 0.115741 0.115741
Y=1 0.385802 0.385802 0.385802 0.385802 0.385802
Y=0 0.482253 0.482253 0.482253 0.482253 0.482253
Sum 1.000000 1.000000 1.000000 1.000000 1.000000

All the conditional distributions of $Y$ given different values of $X$ are the same, and hence are the same as the marginal of $Y$ too. That's independence.

You could have drawn the same conclusion by conditioning $X$ on $Y$:

dist5.conditional_dist('X', 'Y')

X=0 X=1 X=2 X=3 Sum
Dist. of X | Y=4 0.125 0.375 0.375 0.125 1.0
Dist. of X | Y=3 0.125 0.375 0.375 0.125 1.0
Dist. of X | Y=2 0.125 0.375 0.375 0.125 1.0
Dist. of X | Y=1 0.125 0.375 0.375 0.125 1.0
Dist. of X | Y=0 0.125 0.375 0.375 0.125 1.0
Marginal of X 0.125 0.375 0.375 0.125 1.0

### Independence of Two Random Variables¶

What we have observed in examples can be turned into a formal definition of independence.

Two random variables $X$ and $Y$ are independent if for every value $x$ of $X$ and $y$ of $Y$, $$P(Y = y \mid X = x) = P(Y = y)$$ That is, no matter what the given $x$ is, the conditional probability of any value of $Y$ is the same as if we didn't know that $X=x$.

By the multiplication rule, the definition of independence can be written in a different way. For any values $x$ of $X$ and $y$ of $Y$,

\begin{align*} P(X = x, Y = y) &= P(X = x)P(Y = y \mid X = x) ~~~ \text{(multiplication rule)} \\ &= P(X=x)P(Y=y) ~~~~~~~~~~~~~~~~~ \text{(independence)} \end{align*}

Independence simplifies the conditional probabilities in the multiplication rule.

### Independence of Two Events¶

Correspondingly, there are two equivalent definitions of the independence of two events. The first encapsulates the main idea of independence, and the second is useful for calculation.

Two events $A$ and $B$ are independent if $P(B \mid A) = P(B)$. Equivalently, $A$ and $B$ are independent if $P(AB) = P(A)P(B)$.

It is a fact that if $X$ and $Y$ are independent random variables, then any event determined by $X$ is independent of any event determined by $Y$. For example, if $X$ and $Y$ are independent and $x$ is a number, then $\{X=x\}$ is independent of $\{Y>x\}$. Also, any function of $X$ is independent of any function of $Y$.

You can prove these facts by partitioning and then using the definition of independence. The proofs are routine but somewhat labor intensive. You are welcome to just accept the facts if you don't want to prove them.

### "i.i.d." Random Variables¶

If two random variables $X$ and $Y$ are independent and identically distributed, they are called "i.i.d." That's one of the most famous acronyms in probability theory. You can think of i.i.d. random variables as draws with replacement from a population, or as the results of independent replications of the same experiment.

Suppose the distribution of $X$ is given by $$P(X = i) = p_i, ~~~ i = 1, 2, \ldots, n$$ where $\sum_{i=1}^n p_i = 1$. Now let $X$ and $Y$ be i.i.d. What is $P(X = Y)$? We'll answer this question by using the fundamental method, now in random variable notation.

\begin{align*} P(X = Y) &= \sum_{i=1}^n P(X = i, Y = i) ~~~ \text{(partitioning)} \\ &= \sum_{i=1}^n P(X = i)P(Y = i) ~~~ \text{(independence)} \\ &= \sum_{i=1}^n p_i \cdot p_i ~~~ \text{(identical distributions)} \\ &= \sum_{i=1}^n p_i^2 \end{align*}

The last expression is easy to calculate if you know the numerical values of all the $p_i$.

In the same way,

\begin{align*} P(Y > X) &= \sum_{i=1}^n P(X = i, Y > i) \\ &= \sum_{i=1}^n P(X = i)P(Y > i) \\ &= \sum_{i=1}^{n-1} P(X = i)P(Y > i) \end{align*}

because $P(Y > n) = 0$. Now for each $i < n$, $P(Y > i) = P(X > i) = \sum_{j=i+1}^n p_j$. Call this sum $b_i$ for "bigger than $i$". Then

\begin{align*} P(Y > X) &= \sum_{i=1}^{n-1} P(X = i)P(Y > i) \\ &= \sum_{i=1}^{n-1} p_ib_i \end{align*}

This is also a straightforward calculation if you know all the $p_i$. For $n=4$ it boils down to $$p_1 \cdot(p_2 + p_3 + p_4) ~~ + ~~ p_2 \cdot (p_3 + p_4) ~~ + ~~ p_3 \cdot p_4$$