## Calculus in SymPy¶

Working with densities involves calculus which can sometimes be time-consuming. This course gives you two ways of reducing the amount of calculus involved.

• Probabilistic methods can help reduce algebra and calculus. You've seen this with algebra in the discrete case. You'll see it with calculus as we learn more about densities.
• Python has a symbolic math module called SymPy that does algebra, calculus, and much other symbolic math. In this section we will show you how to do calculus using SymPy.

We will demonstrate the methods in the context of an example. Suppose $X$ has density given by

$$f(x) = \begin{cases} 105x^2(1-x)^4 ~~~ \text{if } 0 \le x \le 1 \\ 0 ~~~~~~~~~ \text{otherwise} \end{cases}$$

As you can see from its graph below, $f$ could be used to model the distribution of a random proportion that you think is likely to be somewhere between 0.2 and 0.4. The density $f$ is a polynomial on the unit interval, and in principle the algebra and calculus involved in integrating it are straightforward. But they are tedious. So let's get SymPy to do the work.

First, we will import all the functions in SymPy and set up some printing methods that make the output look nicer than the retro typewritten pgf output you saw in a previous section. In future sections of this text, you can assume that this importing and initialization will have been done at the start.

from sympy import *
init_printing()


Next, we have to tell Python which variables are symbolic and what their possible values are. The function declare lets us do this. It takes as its arguments the string representing the variable, and an option interval specifying the interval of possible values of the variable. In our example, the variable x takes values in the unit interval. In later examples we will show you how to declare infinite intervals of possible values.

declare('x', interval=(0, 1))


Now we will assign the name density to the expression that defines $f$. The expression looks just like a numerical calculation, but the output is algebraic!

density = 105 * x**2 * (1-x)**4
density

$$105 x^{2} \left(- x + 1\right)^{4}$$

That's the density $f$ defined by the equation at the start of the section. Notice that what we naturally think of as $1 - x$ is expressed as $-x + 1$. That's because SymPy is writing the polynomial leading with the term of highest degree.

Let's not simply accept that this function is a density. Let's check that it is a density by integrating it from 0 to 1. To do this, we use the method Integral that takes the name of a function and a tuple (a sequence in parentheses) consisting of the variable of integration and the lower and upper limits of integration. We have assigned this integral to the name total_area

total_area = Integral(density, (x, 0, 1))
total_area

$$\int_{0}^{1} 105 x^{2} \left(- x + 1\right)^{4}\, dx$$

The output of displays the integral, which is nice, but what we really want is its numerical value. In SymPy, this is achieved by rather rudely instructing the method to doit().

total_area.doit()

$$1$$

This confirms that the function $f$ is a density.

We can use Integral again to find the chance of any interval. Here is $P(0.2 < X < 0.4)$.

p_02_04 = Integral(density, (x, 0.2, 0.4)).doit()
p_02_04

$$0.432064000000001$$

For $x$ in the unit interval, the cdf of $X$ is $$F(x) ~ = ~ P(X \le x) ~ = ~ \int_0^x f(s)ds ~ = ~ I(s)~ \Big{\rvert}_0^x ~ = ~ I(x) - I(0)$$

where $I$ is the indefinite integral of $f$.

To get the indefinite integral, simply ask SymPy to integrate the density; there are no limits of integration.

indefinite = Integral(density).doit()
indefinite

$$15 x^{7} - 70 x^{6} + 126 x^{5} - 105 x^{4} + 35 x^{3}$$

Now $F(x) = I(x) - I(0)$. You can see at a glance that $I(0) = 0$ but here is how SymPy would figure that out.

To evaluate $I(0)$, SymPy must substitute $x$ with 0 in the expression for $I$. This is achieved by the method subs that takes the variable as its first argument and the specified value as the second.

I_0 = indefinite.subs(x, 0)
I_0

$$0$$
cdf = indefinite - I_0
cdf

$$15 x^{7} - 70 x^{6} + 126 x^{5} - 105 x^{4} + 35 x^{3}$$

To find the value of the cdf at a specified point, say 0.4, we have to substitute $x$ with 0.4 in the formula for the cdf.

cdf_04 = cdf.subs(x, 0.4)
cdf_04

$$0.580096000000001$$

Thus $P(X \le 0.4)$ is roughly 58%. Earlier we calulated $P(0.2 < X < 0.4) = 43.2\%$, which we can confirm by using the cdf:

cdf_02 = cdf.subs(x, 0.2)
cdf_04 - cdf_02

$$0.432064000000001$$

The expectation $E(X)$ is a definite integral from 0 to 1:

expectation = Integral(x*density, (x, 0, 1)).doit()
expectation

$$\frac{3}{8}$$

Notice how simple the answer is. Later in the course, you will see why.

Here is $E(X^2)$, which turns out to be another simple fraction. Clearly, the density $f$ has interesting properties. We will study them later. For now, let's just get the numerical answers.

expected_square = Integral((x**2)*density, (x, 0, 1)).doit()
expected_square

$$\frac{1}{6}$$

Now you can find $SD(X)$.

sd = (expected_square - expectation**2)**0.5
sd

$$0.161374306091976$$

### SymPy and the Exponential Density¶

One of the primary distributions in probability theory, the exponential distribution has a positive parameter $\lambda$ known as the "rate", and density given by

$$f(t) ~ = \lambda e^{-\lambda t}, ~~~ t \ge 0$$

The density is 0 on the negative numbers. Here is its graph when $\lambda = 3$. To check that $f$ is a density, we have to confirm that its integral is 1. So we will declare two positive symbolic variables t and lamda. Notice the incorrectly spelled lamda. That is because lambda has another meaning in Python, as some of you might know.

In fact lamda is a constant, not a variable. But SymPy needs to know that it's an algebraic object, so we have to declare it as such.

Note the use of positive=True to declare positive variables.

declare('lamda', positive=True)
declare('t', positive=True)


Now we will define the density function. Notice the use of exp for the exponential function. Notice also that the form of the answer looks different from the way we have written it above, though it's algebraically the same.

expon_density = lamda * exp(-lamda * t)
expon_density

$$\frac{\lambda}{e^{\lambda t}}$$

This is an unavoidable aspect of computer aided algebra, and it is the reason we will use SymPy purely for computation, not for display.

To see that the function is a density, we can check that its integral from 0 to $\infty$ is 1. The symbol that SymPy uses for $\infty$ is oo, a double lower case o. It looks very much like $\infty$.

Integral(expon_density, (t, 0, oo)).doit()

$$1$$

Suppose $T$ has the exponential $(\lambda)$ density. Then for $t \ge 0$ the cdf of $T$ is

$$F_T(t) ~ = ~ P(T \le t) ~ = ~ \int_0^t \lambda e^{-\lambda s}ds$$

This is a straightforward integral that you can probably do in your head. However, let's get some more practice using SymPy to find cdf's. We will use the same method that we used to find the cdf in the previous example.

$$\int_0^t \lambda e^{-\lambda s}ds ~ = ~ I(t) - I(0)$$

where $I$ is the indefinite integral of the density. To get this indefinite integral we will use Integral as before, except that this time we must specify t as the variable of integration. That is because SymPy sees two algebraic quantities t and lamda in the density, and doesn't know which one is the variable unless we tell it.

indefinite = Integral(expon_density, t).doit()
indefinite

$$- \frac{1}{e^{\lambda t}}$$

Now use $F_T(t) = I(t) - I(0)$:

I_0 = indefinite.subs(t, 0)
I_0

$$-1$$
cdf = indefinite - I_0
cdf

$$1 - e^{- \lambda t}$$

Thus the cdf of the exponential $(\lambda)$ density is $$F_T(t) ~ = ~ 1 - e^{-\lambda t}$$ Also, $$E(T) ~ = ~ \int_0^\infty t \lambda e^{-\lambda t} dt ~ = ~ \frac{1}{\lambda}$$

which you can check by integration by parts. But SymPy is faster:

expectation = Integral(t*expon_density, (t, 0, oo)).doit()
expectation

$$\frac{1}{\lambda}$$

And $$E(T^2) = \frac{2}{\lambda^2}$$

expected_square = Integral(t**2 * expon_density, (t, 0, oo)).doit()
expected_square

$$\frac{2}{\lambda^{2}}$$

So $$Var(T) ~ = ~ \frac{2}{\lambda^2} - \frac{1}{\lambda^2} ~ = ~ \frac{1}{\lambda^2}$$ and hence $$SD(T) ~ = ~ \frac{1}{\lambda}$$

The purpose of this section has been to give you a workout in SymPy. We will take a closer look at the exponential distribution in the next section.