statistics.rst
author Oleksandr Gavenko <gavenkoa@gmail.com>
Wed, 09 Mar 2016 21:23:23 +0200
changeset 0 328995b5b8fd
permissions -rw-r--r--
init stat project


============
 Statistics
============
.. contents::
   :local:

.. role:: def
   :class: def

Markov inequality
=================

:def:`Markov inequality`: :math:`P(X ≥ a) ≤ E[P]/a` for all :math:`a > 0`.

Chebyshev inequality
====================

:def:`Chebyshev inequality`: if :math:`X` is a random variable with mean
:math:`μ` and variance :math:`σ²` then

.. math::

  P(|X-μ| ≥ c) ≤ σ²/c²

for all :math:`c > 0`.

Central limit theorem
=====================

:def:`Central limit theorem`: let :math:`X_1, ..., X_n, ...` be a sequence of
independent identically distributed random variables with common mean :math:`μ`
and variance :math:`σ²` and let:

.. math::

  Z_n = ((∑_{1≤i≤n} X_i) - n·μ) / (σ·sqrt(n))

Then CDF of :math:`Z_n` converge to standard normal CDF:

.. math::

  Φ(z) = 1/(2·π)·∫_{(-∞;z]} exp(-x²/2) 𝑑x

  lim_{n → ∞} P(Z_n ≤ z) = Φ(z)

Null hypothesis
===============

:def:`Null hypothesis` a statement that the phenomenon being studied produces no
effect or makes no difference, assumption that effect actually due to chance.

p-value
=======

:def:`p-value` is the probability of the apparent effect under the null
hypothesis.

  https://en.wikipedia.org/wiki/P-value
    Wikipedia page

Significance level
==================

If the p-value is less than or equal to the chosen :def:`significance level`
(:math:`α`), the test suggests that the observed data are inconsistent with the
null hypothesis, so the null hypothesis should be rejected.

Hypothesis testing
==================

:def:`Hypothesis testing` is process of interpretation of statistical
significance of given null hypothesis based on observed p-value from sample with
choosen significance level.

After finishing hypothesis testing we should reject null hypothesis or fail to
reject due to lack of enough evidence or ...

Hypothesis testing only take into account:

* that effect might be due to chance; that is, the difference might appear in a
  random sample, but not in the general population

But it doesn't cover cases:

* The effect might be real; that is, a similar difference would be seen in the
  general population.
* The apparent effect might be due to a biased sampling process, so it would not
  appear in the general population.
* The apparent effect might be due to measurement errors.

Asymptotic approximation
========================

CLT say that sample mean distribution is approximated by normal distribution.

With fair enough number of samples approximation is quite good.

So during hypothesis testing usually researcher makes assumption that is is safe
to replace unknown distribution of means for independent and identicaly
distributed individual samples with approximation.

For really small number of samples Student distribution is used instead of
normal distribution. But again it means that researcher made assumption and you
may not agree with it, so it is your right to reject any subsequent decision
based on "wrong" assumption.

Type I error
============

:def:`Type I error` is the incorrect rejection of a true null hypothesis (a
*false positive*).

Type I error rate is at most :math:`α` (significant level).

The p-value of a test is the maximum false positive risk you would take by
rejecting the null hypothesis.

Type II error
=============

:def:`Type II error` is failing to reject a false null hypothesis (a *false
negative*).

Probability of type II error usually called :math:`β`.

Power
=====

:def:`Power` is a probability to reject null hypothesis when it's false. So
power probability is :math:`1-β`.

Confidence interval
===================

:def:`Confidence interval` 

  https://en.wikipedia.org/wiki/Confidence_interval
    Wikipedia page


Question
========

What to do with null hypothesis in classical inference?

I successfully shirked stat classes 10 years ago (last night reading help me
actually to pass exam) and now when I take several Coursera stat classes I have
difficulties with understanding **null hypothesis**. Somehow with unclear
intuition I passed quizzes but want to understand subject.

Suppose we have population and sample some data from population. Reasonable
question: is some property of sample make evidence to be true on population?

Statistic is a real number that can be derived from population or sample.
Classical example is a mean value.

We ask is it statistically significant that statistic of population is near to
statistic of sample.