## Functions on an Outcome Space

### Functions on an Outcome Space¶

Many data sets consist of results of repeated random trials, and therefore many outcome spaces consist of sequences. An outcome space representing two tosses of a coin is

$$ \Omega = \{ \text{HH, HT, TH, TT} \} $$If you were tossing 10 times, the outcome space would consist of the $2^{10}$ sequences of 10 elements where each element is H or T. The outcomes are a pain to list by hand, but computers are good at saving us that kind of pain.

### Product Spaces¶

The *product* of two sets $A$ and $B$ is the set of all pairs $(a, b)$ where $a \in A$ and $b \in B$. This concept is exactly what we need to describe spaces representing multiple trials.

For example, the space representing the outcome of one toss of a coin is $ \Omega_1 = \{ \text{H, T}\}$. The *product* of $\Omega_1$ with itself is the set of pairs (H, H), (H, T), (T, H) and (T, T), which you will recognize as the outcomes of two tosses of a coin. The product of this new space and $\Omega_1$ is the space representing three tosses. And so on.

The Python module `itertools`

contains a function `product`

that constructs product spaces. Let's import it.

```
from itertools import product
```

To see how `product`

works, we will start with the outcomes of one toss of a coin. We are creating an array using `make_array`

but you could use any other way of creating an array or list.

```
one_toss = make_array('H', 'T')
```

To use `product`

, we have to specify the base space, the number of repetitions, and then covert the result to a list.

```
two_tosses = list(product(one_toss, repeat=2))
two_tosses
```

For three tosses, just change the number of repetitions:

```
three_tosses = list(product(one_toss, repeat=3))
three_tosses
```

A *probability space* is an outcome space accompanied by the probabilities of all the outcomes. If you assume all eight outcomes of three tosses are equally likely, the probabilities are all 1/8:

```
three_toss_probs = (1/8)*np.ones(8)
```

The corresponding probability space:

```
three_toss_space = Table().with_columns(
'omega', three_tosses,
'P(omega)', three_toss_probs
)
three_toss_space
```

Product spaces get large very quickly. If you roll a die 5 times, there are almost 8,000 possible outcomes:

```
6**5
```

But we have `product`

so we can still list them all! Here is a probability space representing 5 rolls of a die.

```
die = np.arange(1, 7, 1)
five_rolls = list(product(die, repeat=5))
five_roll_probs = (1/6**5)**np.ones(6**5)
five_roll_space = Table().with_columns(
'omega', five_rolls,
'P(omega)', five_roll_probs
)
five_roll_space
```

### A Function on the Outcome Space¶

Suppose you roll a die five times and add up the number of spots you see. If that seems artificial, be patient for a moment and you'll soon see why it's interesting.

The sum of the rolls is a numerical function on the outcome space $\Omega$ of five rolls. The sum is thus a *random variable*. Let's call it $S$. Then, formally,
$$
S: \Omega \rightarrow \{ 5, 6, \ldots, 30 \}
$$
The range of $S$ is the integers 5 through 30, because each die shows at least one and at most six spots. We can also use the equivalent notation

From a computational perspective, the elements of $\Omega$ are in the column `omega`

of `five_roll_space`

. Let's apply this function and create a larger table.

```
five_rolls_sum = Table().with_columns(
'omega', five_rolls,
'S(omega)', five_roll_space.apply(sum, 'omega'),
'P(omega)', five_roll_probs
)
five_rolls_sum
```

We now have every possible outcome of five rolls of a die, along with its total number of spots. You can see that the first row of the table shows the smallest possible number of spots, corresponding to all the rolls showing 1 spot. The 7776th row shows the largest:

```
five_rolls_sum.take(7775)
```

All the other values of $S$ are between these two extremes.

#### Functions of Random Variables¶

A random variable is a numerical function on $\Omega$. Therefore by composition, a numerical function of a random variable is also a random variable.

For example, $S^2$ is a random variable, calculated as follows:

$$ S^2(\omega) = \big{(} S(\omega)\big{)}^2 $$Thus for example $S^2(\text{[6 6 6 6 6]}) = 30^2 = 900$.

### Events Determined by $S$¶

From the table `five_rolls_sum`

it is hard to tell how many rows show a sum of 6, or 10, or any other value. To better understand the properties of $S$, we have to organize the information in `five_rolls_sum`

.

For any subset $A$ of the range of $S$, define the event $\{S \in A\}$ as

$$ \{S \in A \} = \{\omega: S(\omega) \in A \} $$That is, $\{ S \in A\}$ is the collection of all $\omega$ for which $S(\omega)$ is in $A$.

If that definition looks unfriendly, try it out in a special case. Take $A = \{5, 30\}$. Then $\{S \in A\}$ if and only if either all the rolls show 1 spot or all the rolls show 6 spots. So $$ \{S \in A\} = \{\text{[1 1 1 1 1], [6 6 6 6 6]}\} $$

It is natural to ask about the chance the sum is a particular value, say 10. That's not easy to read off the table, but we can access the corresponding rows:

```
five_rolls_sum.where('S(omega)', are.equal_to(10))
```

There are 126 values of $\omega$ for which $S(\omega) = 10$. Since all the $\omega$ are equally likely, the chance that $S$ has the value 10 is 126/7776.

We are informal with notation and write $\{ S = 10 \}$ instead of $\{ S \in \{10\} \}$: $$ P(S = 10) = \frac{126}{7776} = 1.62\% $$

```
```