# Inequalities

Sometimes it is impossible or not reasonable to calculate the precise value of a probability. In such a case the following inequalities will come in handy.

## 1. Markov's inequality

Markov's inequality gives an upper bound for the probability that a non-negative random variable is greater then or equal to some positive constant.

**Markov's inequality.** Let $X$ be a random variable with non-negative values, then
$$\text{Pr}[X\geq \alpha]\leq \frac{E(X)}{\alpha}$$
for a positive constant $\alpha$.

Let us illustrate the inequality by the following theoretical example.

Let random variable $X$ be given with the following distribution:

$x$ | $x_1$ | $x_2$ | $x_3$ | $x_4$ | $x_5$ | $x_6$ | $x_7$ |
---|---|---|---|---|---|---|---|

$\text{Pr}[X=x]$ | $p_1$ | $p_2$ | $p_3$ | $p_4$ | $p_5$ | $p_6$ | $p_7$ |

were $x_i>0$.

Let us draw again line segments with lengths $p_1,\dots, p_7$ one after another

and place on the line segments boxes with heights $x_1,\dots,x_7$ correspondingly:

We know (see previous lesson) that the area of the shape is $E(X)$

Now, let us fix a positive constant $\alpha$

Clearly the product $\alpha\cdot(p_4+p_5+p_6+p_7)$ is precisely the red area on the picture below

thus, $\alpha\cdot(p_4+p_5+p_6+p_7)\leq E(X)$. Since

$(p_4+p_5+p_6+p_7)=\text{Pr}[X=x_4]+\text{Pr}[X=x_5]+\text{Pr}[X=x_6]+\text{Pr}[X=x_7]=\text{Pr}[X\geq x_4]$

and $x_4\geq \alpha$, we have $(p_4+p_5+p_6+p_7)=\text{Pr}[X\geq \alpha]$. And we have obtained
$$\alpha\cdot\text{Pr}[X\geq \alpha]\leq E(X).$$

A twosides coin, which lands heads with probability $\frac{1}{5}$ each time it is flipped, is flipped 100 times consecutively. Give an upper bound on the probability that it lands heads at least 70 times.

Let us have a randomized algorithm with varing runing-time. Say, that we have somehow measured that the avearge runningtime of the algorith is half an hour. How long do we have to wait for the algorithm to finish on half of the cased, i.e. how long do we have to wait so that we could say that with probability $\frac{1}{2}$ the algorithm has finished.

## 2. Chebyshev's inequality

Chebyshev's inequality is actually a corollary of Markov's inequality.

In order to introduce the inequality, we first need the concept of variance.

Variance $D(X)$ of a random variable $X$ characterizes how scattered are possible values of the random variable. The definition goes as follows $$D(X)=E[(X-E(X))^2].$$

Chebyshev's inequality uses the variance for estimating the probability that the random variable deviates far from its expected value.

**Chebyshev's inequality.** For any random variable $X$ and positive constant $\alpha$
$$\text{Pr}[|X-E(X)|\geq\alpha]\leq\frac{D(X)}{\alpha^2}.$$

Chebyshev's inequality follows from Markov's inequality by considering the random variable $$(X-E(X))^2$$ i.e. if we apply Markov's inequality to the random variable $(X-E(X))^2$ we obtain $$\text{Pr}[(X-E(X))^2\geq\alpha^2]\leq\frac{E[(X-E(X))^2]}{\alpha^2}=\frac{D(X)}{\alpha^2}.$$

Suppose we extract an individual at random from a population whose members have an average income of $1000$ € per month and with the variance of $200000$ €. What is the probability of extracting an individual whose income is either less than $500$ € or greater than $1500$ €?

## 3. Jensen's inequality

Let $X$ be a random variable. Then for any convex-cup function $g$ $$E(g(X))\geq g(E(X))$$ and for any convex-cap function $f$ $$E(f(X))\leq f(E(X)).$$

The second of the inequalities is illustrated by the following picture.

Here the random variable $X$ can have two values:
Let $f$ be a convex-cap function. We can calculate $f(x_1)$ and $f(x_2)$. Clearly the blue line, whose every point represents a possible expected value of $f(x_1)$ and $f(x_2)$, is always below the curved line, which represents the values of the convex-cap function, i.e. indeed $$f(E(X))\geq E(f(X)).$$ |

Jensen's inequalities are often used to get lower and/or upper bounds:

- for cases when there are not enough data for exact calculations, or
- for complex expressions involving probabilities.

For example, let there be $n$ boxes filled with black and white balls. It is known (form experiments) that if a box is chosen at random and a ball is taken from it blindly, then the probability of obtaining a white ball is $\frac{1}{2}$. What is the probability of obtaining two white balls, when after the first ball is taken out, it is placed back in the same box it was taken from. The balls in the box are mixed and then the second ball in taken blindly (from the same box).

Clearly, here are too few data for exact calculations: we only know that there are $n$ boxes and $\text{Pr}[\text{"a white ball is obtained"}]=\frac{1}{2}$.

Note that the event “a white ball is obtained” means that either “the $1^{\text{st}}$ box is chosen” and “a white ball obtained form it”, or “the $2^{\text{nd}}$ box is chosen” and “a white ball obtained form it”, or “the $3^{\text{rd}}$ box is chosen” and “a white ball obtained form it” and so on.

Let us assume that

$\text{Pr}[\text{"obraining a white ball from}\ 1^{\text{st}}\ \text{box"}]=p_1$,

$\text{Pr}[\text{"obraining a white ball from}\ 2^{\text{nd}}\ \text{box"}]=p_2$,

$\dots$

$\text{Pr}[\text{"obraining a white ball from}\ n^{\text{th}}\ \text{box"}]=p_n$.

According to the formula of conditional probability

$\text{Pr}[\text{"the}\ i^{\text{th}}\ \text{box is chosen" and "a white ball obtained form it"}]$

$=\text{Pr}[\text{"the}\ i^{\text{th}}\ \text{box is chosen"}]\cdot\text{Pr}[\text{"a white ball obtained"}\ \mid\ \text{"the}\ i^{\text{th}}\ \text{box is chosen"}]=\frac{1}{n}\cdot p_i$

(as far as we know the boxes are equally probable). Since choosing a $i^{\text{th}}$ box excludes all the other boxes, the events $\text{"the}\ i^{\text{th}}\ \text{box is chosen and a white ball obtained form it"}$ and $\text{"the}\ j^{\text{th}}\ \text{box is chosen and a white ball obtained form it"}$ ($i\neq j$) are mutually exclusive event. Thus,

\begin{align*} \text{Pr}[\text{"a white ball is obtained"}]&=\frac{1}{n}\cdot p_1+\frac{1}{n}\cdot p_2+\dots +\frac{1}{n}\cdot p_n, \end{align*}

i.e. $$\frac{1}{2}=\frac{p_1+p_2+\dots +p_n}{n}.$$

Now, if a box is fixed then choosing the first and the second ball are independent events:

$$\text{Pr}[\text{"the first ball is white" and "the second ball is white"} \mid \text{"the}\ i^{\text{th}}\ \text{box is chosen"} ] = p_i\cdot p_i.$$

By using the formula of conditional probability $$ \text{Pr}[\text{"the}\ i^{\text{th}}\ \text{box is chosen" and "the first ball is white" and "the second ball is white"}]=\frac{1}{n}\cdot p_i\cdot p_i$$

which gives

$$\text{Pr}[\text{"two white balls are obtained"}]=\frac{1}{n}\cdot p_1\cdot p_1+\frac{1}{n}\cdot p_2\cdot p_2+\dots +\frac{1}{n}\cdot p_n\cdot p_n=\frac{p_1^2+p_2^2+\dots +p_n^2}{n}.$$

The event “two white balls are obtained” means that either “the $1^{\text{st}}$ box is chosen” and “the first ball is white” and “the second ball is white”, or “the $2^{\text{nd}}$ box is chosen” and “the first ball is white” and “the second ball is white”, or “the $3^{\text{rd}}$ box is chosen” and “the first ball is white” and “the second ball is white”, and so on.

We know that $g(x)=x^2$ is a convex-cup function. Due to Jensen's inequality
$$\sum_{i=1}^n \frac{1}{n}p_i^2=\sum_{i=1}^n \frac{1}{n}g(p_i)=\color{blue}{E(g(X))\geq g(E(X))}=(E(X))^2 =\Bigl(\frac{1}{n}\sum_{i=1}^n p_i\Bigr)^2,$$
where

$p$ | $p_1$ | $p_2$ | $\dots$ | $p_n$ |
---|---|---|---|---|

$\text{Pr}[X=p]$ | $\frac{1}{n}$ | $\frac{1}{n}$ | $\dots$ | $\frac{1}{n}$ |

Thus,

$$\text{Pr}[\text{"two white balls are obtained"}]=\sum_{i=1}^n \frac{1}{n}p_i^2\color{blue}\geq \Bigl(\frac{1}{n}\sum_{i=1}^n p_i\Bigr)^2=\Bigl(\frac{1}{2}\Bigr)^2=\frac{1}{4}.$$

The example looks quite theoretical, but actually it can model several real life activities. For example, imagine that you are looking for real estate, e.g. an apartment to buy. There are several real estate sites for it (in our example boxes). You know that if you choose randomly a real estate site and an apartment advertisement from there, it is over 99% chance that it is not suitable. What is the probability that out of 10 randomly chosen apartments (these are the balls from our example: non suitable white and suitable black) non is suitable? (We assume that the number of advertisements is large enough so that random with replacement equals random without replacement, i.e. if we have looked at one of the advertisements, we will not put it back into selection.) As complement of the event “non of the 10 suite” is the event “at least one is suitable”, we can calculate the probability that we will find our future home out of the 10.

Solve the real estate example.

Let $X$ be a positive random variable such that $E(X)=\frac{1}{3}$. What can you say about the following expected value: $E(\ln(3X))$?