STAT3612 Data Mining
HKU 201718 Semester 2
Course Syllabus: PDF
Instructor: Dr. Aijun Zhang (ajzhang at hku dot hk; RR224) Moodle@HKU: http://moodle.hku.hk/ RStudio Server: http://stat3612.saas.hku.hk:8787/ Pastyear Materials: Spring 2017 
Class Schedule  Lecture Notes  Tutorials  
Lecture 1  Jan 19  Introduction to Data Science (HTML  Rmd) Big Data, DS job, DS Venn Diagram, DS Workflow R Studio and R Markdown 
Tutorial 1 (HTML  Rmd) 
Lecture 2  Jan 2326  Data Exploration (HTML  Rmd) Data Manipulation, R:dyplr, EDA, Data Visualization, R:ggplot2, Pipes 
Tutorial 2 (HTML  Rmd) 
Lecture 3  Jan 30  Machine Learning (HTML  Rmd) Stat3612 Landscape 

Lecture 4  Feb 2  Linear Regression (HTML  Rmd) LM, LSE, Model Inference Diagnostics, Variable Selection 
Tutorial 3 (HTML  Rmd) 
Lecture 5  Feb 69  Basis Expansion (HTML  Rmd) Nonparametric regression, Feature representation, Piecewise linear fitting, Regression Splines, GAM 
Tutorial 4 (HTML  Rmd) 
Lecture 6  Feb 1323  Regularization (HTML  Rmd) Smoothing spline, Ridge Regression, Lasso, Elastic Net, Sparse Modeling 
Tutorial 5 (HTML  Rmd) 
Lecture 7  March 213  Classification (HTML  Rmd) Logistic/Softmax regression LDA/QDA, kNN, Confusion matrix, ROC curve 
Tutorial 6 (HTML  Rmd) Assignment 1 (Due: March 15) Dataset: BitCoin.csv 
Lecture 8  March 2023  Treebased Methods (HTML  Rmd) Classification and Regression Trees Bagging, Random Forest, GBM 
Tutorial 7 (HTML  Rmd)
Test 1 on April 3 
Lecture 9  March 27  Support Vector Machines (HTML  Rmd) Separating Hyperplane, Maximal Margin, Kernel trick LIBSVM, Hyperparameter optimization 
Tutorial 8 (HTML  Rmd) Assignment 2 (Due: April 11) Group Project Announcement 
Lecture 10  April 613  Neural Networks (HTML  Rmd) MLP, NNet, Backpropagation algorithm Introduction to Deep Learning 
Tutorial 9 (HTML  Rmd) Tutorial 10 (HTML  Rmd) 
Lecture 11  April 1720  Unsupervised Learning (HTML  Rmd) Clustering, Kmeans, PCA, SVD Matrix factorization, Sparse coding 
Tutorial 11 (HTML  Rmd) 
Lecture 12  April 24  MNIST Case Study (HTML  Rmd) Largescale logistic modeling, ANN, CNN 
Tutorial 12 (HTML  Rmd) Assignment 3 (Due: May 6) Test 2 on April 27 
May 4  Project Presentation 2pm–5pm  Written report due May 6 Template (PDF  Rmd) 