| Dates and Titles |
Topics |
Lecture Slides |
Suggested Further Readings |
Lecture 1
Introduction to machine learning |
-
Superised learning
-
Unsupervised learning
-
Probability primer
|
|
|
Lecture 2
Density estimation |
-
Maximum likelihood estimation
-
MAP estimation
-
Bayesian estimation
|
|
|
Lecture 3
Clustering I |
- k-means clustering
- Mixture of Gaussians (MoG)
|
|
- Chapter 9.1 and 9.2 in Bishop's PRML.
|
Lecture 4
Expectation Maximization |
- Jensen's inequality
- Information theory preliminaries
- EM optimization
- Generalized EM
- Incremental EM
- EM for exponetial families
|
|
- Chapter 9.3 and 9.4 in Bishop's PRML.
-
A. P. Dempster, N. M. Laird, and D. B. Rubin (1977)
"Maximum likelihood from incomplete data via the EM algorithm
(with discussion),"
Journal of the Royal Statistical Society B,
vol. 39, pp. 1-38, 1977.
-
R. M. Neal and G. E. Hinton(1999)
"A view of the EM algorithm that
justifies incremental, sparse, and other variants,"
Learning in Graphical Models (edited by M. Jordan),
pp. 355-368, 1999.
|
Lecture 5
Latent variable models |
- Maximum likelihood factor analysis
- Probabilistic PCA
- Mixture of factor analyzers
- Mixture of probabilistic principal component analyzers
- SVD
|
|
- Chapter 12.1 and 12.2 in Bishop's PRML.
- C. M. Bishop (1999),
"Latent variable models,"
In M. I. Jordan (Ed.), Learning in Graphical Models, 1999.
- M. E. Tipping and C. M. Bishop (1999),
"Probabilistic principal component analysis,"
Journal of the Royal Statistical Society, Series B,
vol. 21, pp. 611-622, 1999.
- M. E. Tipping and C. M. Bishop (1999),
"Mixtures of probabilistic principal component analyzers,"
Neural Computation,
vol. 11, pp. 443-482, 1999.
- Z. Ghahramani and G. E. Hinton (1996),
"The EM algorithm for mixtures of factor analyzers,"
University of Toronto Technical Report CRG-TR-96-1.
- S. Roweis (1997),
"EM algorithms for PCA and SPCA,"
NIPS-1997.
|
Lecture 6
Clustering II |
- Spectral clustering
- Nonnegative matrix factorization
|
|
- J. Shi and J. Malik (2000),
"Normalized Cuts and Image Segmentation",
IEEE Trans. Pattern Analysis and Machine Intelligence,
vol. 22, no. 8, pp. 888-905, 2000.
- U. von Luxburg (2007),
"A tutorial on spectral clustering,"
Statistics and Computing,
vol. 17, no. 4, pp. 395-416, 2007.
- See my note on
extremal properties of eigenvalues.
- D. D. Lee and H. S. Seung (1999),
"Learning the Parts of Objects by Non-negative Matrix Factorization",
Nature,
vol. 401, pp. 788-791, 1999.
- C. Ding, T. Li, W. Peng, and H. Park (2006),
"Orthogonal nonnegative matrix tri-factorizations for clustering,"
KDD-2006.
- A. Cichocki, H. Lee, Y.-D. Kim, and S. Choi (2008),
"Nonnegative matrix factorization with alpha-divergence,"
Pattern Recognition Letters,
vol. 29, no. 9, pp. 1433-1440, July 2008.
|
Lecture 7
Regression |
- Regression
- Linear models for regression
- Least suares and RLS
- Bias-variance dilemma
- Bayesian linear regression
|
|
- Chapter 3 in Bishop's PRML.
|
Lecture 8
Linear models for classification |
- Bayes decision theory
- Fisher's linear discriminant analysis
- Logistic regression
- Perceptron
- Support vector machine
|
|
|
Lecture 9
Neural networks |
- Perceptron
- Multilayer perceptron (MLP)
- Radial basis functoin (RBF) network
|
|
- Chapter 5 in Bishop's PRML.
|
Lecture 10
Mixture of experts |
|
|
- Chapter 14.5 in Bishop's PRML.
|
Lecture 11
Kernel methods |
|
|
- Chapter 12.3 in Bishops' PRML.
|
Lecture 12
Hidden Markov models |
- Hidden Markov models (HMMs)
|
|
- Chapter 13.2 in Bishops' PRML.
|