Chapter 1 - Basics and Bayes' Rule
De Morgans' Law
-
\( (S \cap T)^c = S^c \cup T^c \) and \( (S \cup T)^c = S^c \cap T^c \)
-
\( (S^c \cap T^c)^c = S \cup T \)
Problem 4. Parking lot problem
Mary and Tom park their cars in an empty parking lot with 𝑛≥2 consecutive parking spaces (i.e, 𝑛 spaces in a row, where only one car fits in each space). Mary and Tom pick parking spaces at random; of course, they must each choose a different space. (All pairs of distinct parking spaces are equally likely.) What is the probability that there is at most one empty parking space between them?
View answer
- when first car is at head or tail, second car has 2 configurations for each case (gives 4 configurations)
- when first car is at head+1 or tail-1, second har has 3 configurations for each case (gives 6 configurations)
- when first car is at the rest of the place, second car has 4 configurations for each case, giving 4*(n-4) configurations
- the total number of all possible configurations is n*(n-1)
- so the answer should be (4*(n-4)+ 4 +6)/(n*(n-1))
3 Important tools
- Multiplication rule
- Total probability theorem
- Bayes' rule (-> inference)
Conditional probability
\[ P(A | B) = \frac{ P(A \cap B)}{ P(B)} \]
\[ P(A | B) P(B)= P(B | A) P(A) \]
\[ P(A | B) = \frac{ P(B | A) P(A) } { P(B | A) P(A) + P(B | A^C) P( A^C)} \]
Multiplication rule

Total probability theorem
\( P(B) = P(B | A) P(A) + P(B | A^C) P( A^C) \) can be generalized to:
\[ P(B) = \sum_{i} P(B | A_i) P(A_i) \quad \text{ given } \sum_i P(A_i) = 1 \]
Definition of independence:
\[ P(A \cap B) = P(A) \cdot P(B) \]
Which also implies \( P(B | A) = P(B); P(A | B) = P(A)\), and also implies \( P(A \cap B^C) = P(A) \cdot P(B^C) \).
Note that ìndependence has no relation related to disjoint.
Independence is a relation about information. It is important
to always keep in mind the intuitive meaning of independence.
Two events are independent if the occurrence of one event
does not change our beliefs about the other.
Conditional independence


Permutation & Combination
Permutation - pick r items out of n items, order matters:
\[
P = \frac{n!}{(n-r)!}
\]
Combination - pick r items out of n items, order does not matter:
\[
C = \frac{P}{r!} = \frac{n!}{(n-r)! \cdot r!}
\]
Number of subsets of n element:
\[
2^n
\]
\[ \sum_{k=0}^{n} \left( \begin{array}{c} n \\ k \end{array} \right) = 2^n \]
Number of possible outcomes of continuously toss a fair 6-faces dice for n times:
\[
6^n
\]
Binomial coefficient \( \left( \begin{array}{c} n \\ k \end{array} \right) \)
- n>=1 independent coin tosses
P(H)=p: \[ P(k \quad heads) = \left( \begin{array}{c} n \\ k \end{array} \right) p^k (1-p)^{n-k} \]
\[ \sum_{k=0}^{n} \left( \begin{array}{c} n \\ k \end{array} \right) p^k (1-p)^{n-k} = 1 \]
Multinomial coefficient

Chapter 2 - Discrete Random Variables
Random variable examples:
- Bernoulli
- Uniform
- Binomial
- Geometric
Random Variables
Random variables can be understand as a function that
transfers a random selection of a sample, into a
numeric value.
随机变量可以看作是一个函数!!!
Notation: random variable X, numerical value x
Probability Mass Function (PMF)
PMF also called as probablility distribution.
Bernoulli
- X = 1, with probability p,
- X = 0, with probability 1 - p.
E[X] = p;
var(X) = E[X^2] - (E[X])^2 = p(1-p)
Uniform
- each outcome have the same probability
- h(X) = 1 / (b-a+1)
- E[X] = (a + b) / 2
- var(X) = 1/12 * (b-a) * (b-a+2)
For the continuous case:
- E[X] = (a + b) / 2
- var(X) = 1/12 * (b-a)^2
Binomial
nindependent tosses of a coin with P(Heads)=p, withX= number of heads observed.
E(X) = E(X1) + E(X2) ... + E(Xn) = p + p + p + ... p = n * p
var(X) = var(X1 + X2 + ...) = var(X1) + var(X2)... = n * var(X1) = n * BernoulliVar = np(1-p)
Note above: because of independence.
Geometric
- infinitely many independent tosses of a coin,
P(Heads)=p,
X= number of tosses until the first Heads. \( P(X=k) = (1-p)^{k-1} p \). If p=0.5, then it is like 1/2, 1/4, 1/8 ... - model the time we have to wait until something happens, especially for the continuous case.
Memorylessness
- E[X] = 1 / p
- var(X) = (1-p)/p^2
For the continuous case:
- E[x] = 1/lambda
- var(X) = 1/lambda^2


Variance

