Statistics

These notes are a work in progress

Completion progress:

40%

Things I want to add to this page (each adds 5% to the progress bar):

  • All my notes from CS 348 - Principles of Data Science
  • Probability Definitions & Conditional Probability
  • Random Variable
  • Markov And Chebyshev Bounds
  • Markov Chains
  • Bayesian Networks and d separation
  • Markov Decision Processes
  • Hidden Markov Models
  • Causal Inference
  • Kalman filter
  • Probability Rules
  • Statistical tests

Definitions

  • Central limit theorem - As the sample size tends to infinity, the distribution of sample means approaches the normal distribution
  • Law of Large Numbers - As the sample size tends to infinity, the sample mean equals the population mean
  • Permutations - Different ordered sequences (ex. 012, 021, 102, 120, 201, 210)
  • Combinations - Different unordered sets (ex. 012)

Bayes Rule

Distributions

Discrete

  • Uniform - Everything has equal probability (ex. rolling a fair die)
  • Bernoulli - Probability between two outcomes (ex. (possibly unfair) coin toss)
  • Binomial - Sum of outcomes of something that follows a Bernoulli distribution (sampling with replacement) (ex. flip a coin 20 times; how many times does it come up heads?)
  • Hypergeometric - Binomial but sampling without replacement (ex. drawing black or red balls from a jar without replacement)
  • Poisson - Events that occur at random times but with a fixed average rate (ex. number of calls received by a call center in an hour)
  • Geometric - If the binomial distribution is "How many successes?" then the geometric distribution is "How many failures until a success?"
  • Negative Binomial - Number of failures until r successes have occurred

Continuous

  • Normal (Gaussian) - The sum of trials (from one of several kinds of distributions)
  • Exponential - How long until event? (ex. How long until the next customer calls?)
  • Weibull - Generalization of exponential. Models increasing/decreasing rates of failure over time.
  • Log-normal - Like normal, but the product of trials
  • Student's t - As its parameter `\nu` increases, approaches the normal distribution
  • Chi-squared (`\chi^2`) - Distribution of the sum of squares of normally-distributed values
  • Gamma - TODO
  • Beta - TODO

Linear regression

Residual sum of squares `sum_(i=1)^N (y-\hat{y})^2`
Ordinary least squares regression (OLS) `argmin_w sum_(i=1)^N (y_i-wX_i)^2`
Ridge regression (OLS with L2 regularization) `argmin_w sum_(i=1)^N (y_i-wX_i)^2 + \lambda sum_(j=1)^M w_j^2`
LASSO regression (OLS with L1 regularization) `argmin_w sum_(i=1)^N (y_i-wX_i)^2 + \lambda sum_(j=1)^M w_j`

Biases and Miscellaneous Info