statistics.rst
changeset 0 328995b5b8fd
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/statistics.rst	Wed Mar 09 21:23:23 2016 +0200
@@ -0,0 +1,160 @@
+
+============
+ Statistics
+============
+.. contents::
+   :local:
+
+.. role:: def
+   :class: def
+
+Markov inequality
+=================
+
+:def:`Markov inequality`: :math:`P(X ≥ a) ≤ E[P]/a` for all :math:`a > 0`.
+
+Chebyshev inequality
+====================
+
+:def:`Chebyshev inequality`: if :math:`X` is a random variable with mean
+:math:`μ` and variance :math:`σ²` then
+
+.. math::
+
+  P(|X-μ| ≥ c) ≤ σ²/c²
+
+for all :math:`c > 0`.
+
+Central limit theorem
+=====================
+
+:def:`Central limit theorem`: let :math:`X_1, ..., X_n, ...` be a sequence of
+independent identically distributed random variables with common mean :math:`μ`
+and variance :math:`σ²` and let:
+
+.. math::
+
+  Z_n = ((∑_{1≤i≤n} X_i) - n·μ) / (σ·sqrt(n))
+
+Then CDF of :math:`Z_n` converge to standard normal CDF:
+
+.. math::
+
+  Φ(z) = 1/(2·π)·∫_{(-∞;z]} exp(-x²/2) 𝑑x
+
+  lim_{n → ∞} P(Z_n ≤ z) = Φ(z)
+
+Null hypothesis
+===============
+
+:def:`Null hypothesis` a statement that the phenomenon being studied produces no
+effect or makes no difference, assumption that effect actually due to chance.
+
+p-value
+=======
+
+:def:`p-value` is the probability of the apparent effect under the null
+hypothesis.
+
+  https://en.wikipedia.org/wiki/P-value
+    Wikipedia page
+
+Significance level
+==================
+
+If the p-value is less than or equal to the chosen :def:`significance level`
+(:math:`α`), the test suggests that the observed data are inconsistent with the
+null hypothesis, so the null hypothesis should be rejected.
+
+Hypothesis testing
+==================
+
+:def:`Hypothesis testing` is process of interpretation of statistical
+significance of given null hypothesis based on observed p-value from sample with
+choosen significance level.
+
+After finishing hypothesis testing we should reject null hypothesis or fail to
+reject due to lack of enough evidence or ...
+
+Hypothesis testing only take into account:
+
+* that effect might be due to chance; that is, the difference might appear in a
+  random sample, but not in the general population
+
+But it doesn't cover cases:
+
+* The effect might be real; that is, a similar difference would be seen in the
+  general population.
+* The apparent effect might be due to a biased sampling process, so it would not
+  appear in the general population.
+* The apparent effect might be due to measurement errors.
+
+Asymptotic approximation
+========================
+
+CLT say that sample mean distribution is approximated by normal distribution.
+
+With fair enough number of samples approximation is quite good.
+
+So during hypothesis testing usually researcher makes assumption that is is safe
+to replace unknown distribution of means for independent and identicaly
+distributed individual samples with approximation.
+
+For really small number of samples Student distribution is used instead of
+normal distribution. But again it means that researcher made assumption and you
+may not agree with it, so it is your right to reject any subsequent decision
+based on "wrong" assumption.
+
+Type I error
+============
+
+:def:`Type I error` is the incorrect rejection of a true null hypothesis (a
+*false positive*).
+
+Type I error rate is at most :math:`α` (significant level).
+
+The p-value of a test is the maximum false positive risk you would take by
+rejecting the null hypothesis.
+
+Type II error
+=============
+
+:def:`Type II error` is failing to reject a false null hypothesis (a *false
+negative*).
+
+Probability of type II error usually called :math:`β`.
+
+Power
+=====
+
+:def:`Power` is a probability to reject null hypothesis when it's false. So
+power probability is :math:`1-β`.
+
+Confidence interval
+===================
+
+:def:`Confidence interval` 
+
+  https://en.wikipedia.org/wiki/Confidence_interval
+    Wikipedia page
+
+
+Question
+========
+
+What to do with null hypothesis in classical inference?
+
+I successfully shirked stat classes 10 years ago (last night reading help me
+actually to pass exam) and now when I take several Coursera stat classes I have
+difficulties with understanding **null hypothesis**. Somehow with unclear
+intuition I passed quizzes but want to understand subject.
+
+Suppose we have population and sample some data from population. Reasonable
+question: is some property of sample make evidence to be true on population?
+
+Statistic is a real number that can be derived from population or sample.
+Classical example is a mean value.
+
+We ask is it statistically significant that statistic of population is near to
+statistic of sample.
+