statistics.rst
changeset 0 328995b5b8fd
equal deleted inserted replaced
-1:000000000000 0:328995b5b8fd
       
     1 
       
     2 ============
       
     3  Statistics
       
     4 ============
       
     5 .. contents::
       
     6    :local:
       
     7 
       
     8 .. role:: def
       
     9    :class: def
       
    10 
       
    11 Markov inequality
       
    12 =================
       
    13 
       
    14 :def:`Markov inequality`: :math:`P(X ≥ a) ≤ E[P]/a` for all :math:`a > 0`.
       
    15 
       
    16 Chebyshev inequality
       
    17 ====================
       
    18 
       
    19 :def:`Chebyshev inequality`: if :math:`X` is a random variable with mean
       
    20 :math:`μ` and variance :math:`σ²` then
       
    21 
       
    22 .. math::
       
    23 
       
    24   P(|X-μ| ≥ c) ≤ σ²/c²
       
    25 
       
    26 for all :math:`c > 0`.
       
    27 
       
    28 Central limit theorem
       
    29 =====================
       
    30 
       
    31 :def:`Central limit theorem`: let :math:`X_1, ..., X_n, ...` be a sequence of
       
    32 independent identically distributed random variables with common mean :math:`μ`
       
    33 and variance :math:`σ²` and let:
       
    34 
       
    35 .. math::
       
    36 
       
    37   Z_n = ((∑_{1≤i≤n} X_i) - n·μ) / (σ·sqrt(n))
       
    38 
       
    39 Then CDF of :math:`Z_n` converge to standard normal CDF:
       
    40 
       
    41 .. math::
       
    42 
       
    43   Φ(z) = 1/(2·π)·∫_{(-∞;z]} exp(-x²/2) 𝑑x
       
    44 
       
    45   lim_{n → ∞} P(Z_n ≤ z) = Φ(z)
       
    46 
       
    47 Null hypothesis
       
    48 ===============
       
    49 
       
    50 :def:`Null hypothesis` a statement that the phenomenon being studied produces no
       
    51 effect or makes no difference, assumption that effect actually due to chance.
       
    52 
       
    53 p-value
       
    54 =======
       
    55 
       
    56 :def:`p-value` is the probability of the apparent effect under the null
       
    57 hypothesis.
       
    58 
       
    59   https://en.wikipedia.org/wiki/P-value
       
    60     Wikipedia page
       
    61 
       
    62 Significance level
       
    63 ==================
       
    64 
       
    65 If the p-value is less than or equal to the chosen :def:`significance level`
       
    66 (:math:`α`), the test suggests that the observed data are inconsistent with the
       
    67 null hypothesis, so the null hypothesis should be rejected.
       
    68 
       
    69 Hypothesis testing
       
    70 ==================
       
    71 
       
    72 :def:`Hypothesis testing` is process of interpretation of statistical
       
    73 significance of given null hypothesis based on observed p-value from sample with
       
    74 choosen significance level.
       
    75 
       
    76 After finishing hypothesis testing we should reject null hypothesis or fail to
       
    77 reject due to lack of enough evidence or ...
       
    78 
       
    79 Hypothesis testing only take into account:
       
    80 
       
    81 * that effect might be due to chance; that is, the difference might appear in a
       
    82   random sample, but not in the general population
       
    83 
       
    84 But it doesn't cover cases:
       
    85 
       
    86 * The effect might be real; that is, a similar difference would be seen in the
       
    87   general population.
       
    88 * The apparent effect might be due to a biased sampling process, so it would not
       
    89   appear in the general population.
       
    90 * The apparent effect might be due to measurement errors.
       
    91 
       
    92 Asymptotic approximation
       
    93 ========================
       
    94 
       
    95 CLT say that sample mean distribution is approximated by normal distribution.
       
    96 
       
    97 With fair enough number of samples approximation is quite good.
       
    98 
       
    99 So during hypothesis testing usually researcher makes assumption that is is safe
       
   100 to replace unknown distribution of means for independent and identicaly
       
   101 distributed individual samples with approximation.
       
   102 
       
   103 For really small number of samples Student distribution is used instead of
       
   104 normal distribution. But again it means that researcher made assumption and you
       
   105 may not agree with it, so it is your right to reject any subsequent decision
       
   106 based on "wrong" assumption.
       
   107 
       
   108 Type I error
       
   109 ============
       
   110 
       
   111 :def:`Type I error` is the incorrect rejection of a true null hypothesis (a
       
   112 *false positive*).
       
   113 
       
   114 Type I error rate is at most :math:`α` (significant level).
       
   115 
       
   116 The p-value of a test is the maximum false positive risk you would take by
       
   117 rejecting the null hypothesis.
       
   118 
       
   119 Type II error
       
   120 =============
       
   121 
       
   122 :def:`Type II error` is failing to reject a false null hypothesis (a *false
       
   123 negative*).
       
   124 
       
   125 Probability of type II error usually called :math:`β`.
       
   126 
       
   127 Power
       
   128 =====
       
   129 
       
   130 :def:`Power` is a probability to reject null hypothesis when it's false. So
       
   131 power probability is :math:`1-β`.
       
   132 
       
   133 Confidence interval
       
   134 ===================
       
   135 
       
   136 :def:`Confidence interval` 
       
   137 
       
   138   https://en.wikipedia.org/wiki/Confidence_interval
       
   139     Wikipedia page
       
   140 
       
   141 
       
   142 Question
       
   143 ========
       
   144 
       
   145 What to do with null hypothesis in classical inference?
       
   146 
       
   147 I successfully shirked stat classes 10 years ago (last night reading help me
       
   148 actually to pass exam) and now when I take several Coursera stat classes I have
       
   149 difficulties with understanding **null hypothesis**. Somehow with unclear
       
   150 intuition I passed quizzes but want to understand subject.
       
   151 
       
   152 Suppose we have population and sample some data from population. Reasonable
       
   153 question: is some property of sample make evidence to be true on population?
       
   154 
       
   155 Statistic is a real number that can be derived from population or sample.
       
   156 Classical example is a mean value.
       
   157 
       
   158 We ask is it statistically significant that statistic of population is near to
       
   159 statistic of sample.
       
   160