probability-discrete.rst
author Oleksandr Gavenko <gavenkoa@gmail.com>
Tue, 05 Apr 2016 17:22:50 +0300
changeset 11 44779fa3053d
parent 9 d4bed13a2c37
child 12 fa0dee44fe1f
permissions -rw-r--r--
Law of Iterated Expectations.


=============
 Probability
=============
.. contents::
   :local:

.. role:: def
   :class: def

PMF
===

:def:`PMF` or :def:`probability mass function` or :def:`probability law` or
:def:`probability discribuion` of discrete random variable is a function that
for given number give probability of that value.

To denote PMF used notations:

.. math::

   PMF(X = x) = P(X = x) = p_X(x) = P({ω ∈ Ω: X(ω) = x})

   PMF(a ≤ X ≤ b) = P(a ≤ X ≤ b) = ∑_{a ≤ x ≤ b}\ P(X = x)

   p_X(x) ≥ 0

   ∑_x\ p_X(x) = 1

where :math:`X` is a random variable on space :math:`Ω` of outcomes which mapped
to real number via :math:`X(ω)`.

Expected value
==============

:def:`Expected value` of PMF is:

.. math::

  E[X] = Σ_{ω∈Ω} Χ(x) * p(ω) = Σ_{x} x * p_X(x)

We write :math:`a ≤ X ≤ b` for :math:`∀ ω∈Ω a ≤ X(ω) ≤ b`.

If :math:`X ≥ 0` then :math:`E[X] ≥ 0`.

if :math:`a ≤ X ≤ b` then :math:`a ≤ E[X] ≤ b`.

If :math:`Y = g(X)` (:math:`∀ ω∈Ω Y(ω) = g(X(ω))`) then:

.. math::

  E[Y] = Σ_{x} g(x) * p_X(x)

**Proof** TODO:

.. math::

  E[Y] = Σ_{y} y * p_Y(y)

  = Σ_{y∈ℝ} y * Σ_{ω∈Ω: Y(ω)=y} p(ω)

  = Σ_{y∈ℝ} y * Σ_{ω∈Ω: g(X(ω))=y} p(ω)

  = Σ_{y∈ℝ} y * Σ_{x∈ℝ: g(x)=y} Σ_{ω∈Ω: X(ω) = x} p(ω)

  = Σ_{y∈ℝ} y * Σ_{x∈ℝ: g(x)=y} p_X(x)

  = Σ_{y∈ℝ} Σ_{x∈ℝ: g(x)=y} y * p_X(x)

  = Σ_{x∈ℝ} Σ_{y∈ℝ: g(x)=y} y * p_X(x)

  = Σ_{x} g(x) * p_X(x)

.. math::

  E[a*X + b] = a*E[X] + b

Variance
========

:def:`Variance` is a:

.. math::

  var[X] = E[(X - E[X])^2] = E[X^2] - E^2[X]

:def:`Standard deviation` is a:

.. math::

  σ_Χ = sqrt(var[X])

Property:

.. math::

  var(a*X + b) = a² · var[X]


Total probability theorem
=========================

Let :math:`A_i ∩ A_j = ∅` for :math:`i ≠ j` and :math:`∑_i\ A_i = Ω`:

.. math::

  p_X(x) = Σ_i P(A_i)·p_{X|A_i}(x)

Conditional PMF on event
========================

:def:`Conditional PMF on event` is:

.. math::

  p_{X|A}(x) = P(X=x | A)

  E[X|A] = ∑_x\ x·p_{X|A}(x)

Total expectation theorem
=========================

.. math::

  E[X] = Σ_i\ P(A_i)·E[X|A_i]

To prove theorem just multiply total probability theorem by :math:`x`.

Joint PMF
=========

:def:`Joint PMF` of random variables :math:`X_1,...,X_n` is:

.. math::

   p_{X_1,...,X_n}(x_1,...,x_n) = P(AND_{x_1,...,x_n}: X_i = x_i)

Properties:

.. math::

  E[X+Y] = E[X] + E[Y]

Conditional joint PMF
=====================

:def:`Conditional joint PMF` is:

.. math::

  p_{X|Y}(x|y) = P(X=x | Y=y) = P(X=x \& Y=y) / P(Y=y)

So:

.. math::

  p_{X,Y}(x,y) = p_Y(y)·p_{X|Y}(x|y) = p_X(x)·p_{Y|X}(y|x)

  p_{X,Y,Z}(x,y,z) = p_Y(y)·p_{Z|Y}(z|y)·p_{X|Y,Z}(x|y,z)

  ∑_{x,y}\ p_{X,Y|Z}(x,y|z) = 1

Conditional expectation of joint PMF
====================================

:def:`Conditional expectation of joint PMF` is:

.. math::

  E[X|Y=y] = ∑_x\ x·p_{X|Y}(x|y)

  E[g(X)|Y=y] = ∑_x\ g(x)·p_{X|Y}(x|y)

Total probability theorem for joint PMF
=======================================
.. math::

  p_X(x) = ∑_y\ p_Y(y)·p_{X|Y}(x|y)

Total expectation theorem for joint PMF
=======================================
.. math::

  E[X] = ∑_y\ p_Y(y)·E[X|Y=y]

Proof:

.. math::

   ∑_y\ p_Y(y)·E[X|Y=y] = ∑_y\ p_Y(y)·∑_x\ x·p_{X|Y}(x|y)

   = ∑_y\ ∑_x\ p_Y(y)·x·p_{X|Y}(x|y) = ∑_x\ ∑_y\ x·p_Y(y)·p_{X|Y}(x|y)

   = ∑_x\ x·∑_y\ p_Y(y)·p_{X|Y}(x|y) = ∑_x\ x·p_X(x) = E[X]

Conditional expectation of joint PMF
====================================

:def:`Conditional expectation of joint PMF` is random variable :math:`E[X|Y]`
defined as:

.. math:: E[X|Y](y) = E[X|Y=y]

Property:

.. math:: E[g(Y)·X|Y] = g(Y)·E[X|Y]

For invertible funtion :math:`h`:

.. math:: E[X|h(Y)] = E[X|Y]

Proof:

.. math::

   E[X|Y=y] = E[X|h(Y)=h(y)]

Law of Iterated Expectations
============================

.. math:: E[E[X|Y]] = E[X]

Proof (using total expectation theorem):

.. math::

   E[E[X|Y]] = ∑_y\ E[X|Y](y) = ∑_y\ E[X|Y=y] = E[X]

Generalisation of Law of Iterated Expectations:

.. math:: E[E[X|Y,Z]|Y] = E[X|Y]

Proof, for each :math:`y∈Y`:

.. math::

   E[X|Y=y] = ∑_x\ x·p_{X|Y}(x|Y=y) = ∑_x\ x·p_{X,Y}(x,y)/p_Y(y)

   = ∑_x\ x·∑_z\ p_{X,Y,Z}(x,y,z)/p_Y(y)

   = ∑_x\ x·∑_z\ p_{X|Y,Z}(x|Y=y,Z=z)·p_{Y,Z}(y,z)/p_Y(y)

   = ∑_x\ x·∑_z\ p_{X|Y,Z}(x|Y=y,Z=z)·p_{Z|Y}(z|Y=y)

   = ∑_x\ ∑_z\ x·p_{X|Y,Z}(x|Y=y,Z=z)·p_{Z|Y}(z|Y=y)

   = ∑_z\ ∑_x\ x·p_{X|Y,Z}(x|Y=y,Z=z)·p_{Z|Y}(z|Y=y)

   = ∑_z\ p_{Z|Y}(z|Y=y)·∑_x\ x·p_{X|Y,Z}(x|Y=y,Z=z)

   = ∑_z\ p_{Z|Y}(z|Y=y)·E[X|Y,Z] = E[E[X|Y,Z]|Y=y]

Conditional variance
====================

:def:`Conditional variance` of :math:`X` on :math:`Y` is r.v.:

.. math:: var(X|Y)(y) = var(X|Y=y) = E[(X - E[X|Y=y])²|Y=y]

or in another notation:

.. math:: var(X|Y) = E[X²|Y] - (E[X|Y])²

By applying expected value by :math:`Y` on both sides:

.. math:: E[var(X|Y)] = E[E[X²|Y]] - E[(E[X|Y])²] = E[X²] - E[(E[X|Y])²]

on another hand:

.. math:: var(E[X|Y]) = E[(E[X|Y])²] - (E[E[X|Y]])² = E[(E[X|Y])²] - (E[X])²

By adding last two expression:

.. math:: E[var(X|Y)] + var(E[X|Y]) = E[X²] - (E[X])² = var(X)

So:

.. math:: var(X) = E[var(X|Y)] + var(E[X|Y])

Independence of r.v.
====================

r.v. :math:`X` and :math:`Y` is :def:`independent` if:

.. math::

  ∀_{x,y}: p_{X,Y}(x,y) = p_X(x)·p_Y(y)

So if two r.v. are independent:

.. math::

  E[X·Y] = E[X]·E[Y]

  var(X+Y) = var(X) + var(Y)

Convolution formula
===================

If :math:`Z = X + Y` and X and Y is independent r.v. then:

.. math:: p_Z(z) = ∑_x\ p_X(x)·p_Y(z-x)

Proof:

.. math::

   p_Z(z) = ∑_{x,y:x+y=z}\ p_Z(z) = ∑_{x,y:x+y=z}\ P(X=x,Y=z-x)

   = ∑_{x,y:x+y=z}\ P(X=x)·P(Y=z-x) = ∑_x\ p_X(x)·p_Y(z-x)

Well known discrete r.v.
========================

Bernoulli random variable
-------------------------

:def:`Bernoulli random variable` with parameter :math:`p` is a random variable
that have 2 outcomes denoted as :math:`0` and :math:`1` with probabilities:

.. math::

  p_X(0) = 1 - p

  p_X(1) = p

This random variable models a trial of experiment that result in success or
failure.

:def:`Indicator` of r.v. event :math:`A` is function::

   I_A = 1 iff A occurs, else 0

.. math::

  P_{I_A} = p(I_A = 1) = p(A)

  I_A*I_B = I_{A∩B}

.. math::

  E[bernoulli(p)] = 0*(1-p) + 1*p = p

  var[bernoulli(p)] = E[bernoulli(p) - E[bernoulli(p)]]

   = (0-p)²·(1-p) + (1-p)²·p = p²·(1-p) + (1 - 2p + p²)·p

   = p² - p³ + p - 2·p² + p³ = p·(1-p)

Discret uniform random variable
-------------------------------

:def:`Discret uniform random variable` is a variable with parameters :math:`a`
and :math:`b` in sample space :math:`{x: a ≤ x ≤ b & x ∈ ℕ}` with equal
probability of each possible outcome:

.. math::

  p_{unif(a,b)}(x) = 1 / (b-a+1)

.. math::

  E[unif(a,b)] = Σ_{a ≤ x ≤ b} x * 1/(b-a+1)
  = 1/(b-a+1) * Σ_{a ≤ x ≤ b} x

  = 1/(b-a+1) * (Σ_{a ≤ x ≤ b} a + Σ_{0 ≤ x ≤ b-a} x)

  = 1/(b-a+1) * ((b-a+1)*a + (b-a)*(b-a+1)/2)

  = a + (b-a)/2
  = (b+a)/2


.. math::

  var[unif(a,b)] = E[unif²(a,b)] - E²[unif(a,b)]

  = ∑_{a≤x≤b} x²/(b-a+1) - (b+a)²/4

  = 1/(b-a+1)·(∑_{0≤x≤b} x² - ∑_{0≤x≤a-1} x²) - (b+a)²/4

  = 1/(b-a+1)·(b+3·b²+2·b³ - (a-1)+3·(a-1)²+2·(a-1)³)/6 - (b+a)²/4

  = (2·b² + 2·a·b + b + 2·a² - a)/6 - (b+a)²/4

  = (b - a)·(b - a + 2) / 12

.. NOTE::

   From Maxima::

     sum(i^2,i,0,n), simpsum=true;

              2      3
       n + 3 n  + 2 n
       ---------------
             6

     factor(b+3*b^2+2*b^3 - (a-1)-3*(a-1)^2-2*(a-1)^3);

                       2                  2
       (b - a + 1) (2 b  + 2 a b + b + 2 a  - a)

     factor((2*b^2 + 2*a*b + b + 2*a^2 - a)/6 - (b+a)^2/4), simp=true;

       (b - a) (2 - a + b)
       -------------------
               12

Binomial random variable
------------------------

:math:`Binomial random variable` is a r.v. with parameters :math:`n` (positive
integer) and p from interval :math:`(0,1)` and sample space of positive integers
from inclusive region :math:`[0, n]`:

.. math::

  p_{binom(n,p)}(x) = n!/(x!*(n-x)!) p^x p^{n-x}

Binomial random variable models a number of success of :math:`n` independent
trails of Bernoulli experimants.

.. math::

  E[binom(n,p)] = E[∑_{1≤x≤n} bernoulli(p)] = ∑_{1≤x≤n} E[bernoulli(p)] = n·p

  var[binom(n,p)] = var[∑_{1≤x≤n} bernoulli(p)] = ∑_{1≤x≤n} var[bernoulli(p)] = n·p·(1-p)

Geometric random variable
-------------------------

:def:`Geometric random variable` is a r.v. with parameter :math:`p` from
half open interval :math:`(0,1]`, sample space is all positive numbers:

.. math::

  p_{geom(p)}(x) = p (1-p)^(x-1)

This random variable models number of tosses of biased coin until first success.

.. math::

  E[geom(p)] = ∑_{x=1..∞} x·p·(1-p)^(x-1)

  = p·∑_{x=1..∞} x·(1-p)^(x-1)

  = p/(1-p)·∑_{x=0..∞} x·(1-p)^x

  = p/(1-p)·(1-p)/(1-p - 1)² = p/p² = 1/p

.. NOTE::

   Maxima calculation::

     load("simplify_sum");
     simplify_sum(sum(k * x^k, k, 0, inf));
       Is abs(x) - 1 positive, negative or zero?
       negative;
       Is x positive, negative or zero?
       positive;
       Is x - 1 positive, negative or zero?
       negative;
            x
       ------------
        2
       x  - 2 x + 1