STAT3612 Data Mining
HKU 201617 Semester 2
Instructor: Dr. Aijun Zhang (ajzhang at hku dot hk; RR224) Tutor: Mr. Jia You (Jason) (u3005315 at hku dot hk; RR116) Lecture Hours: Monday 12:30pm — 2:20pm (RR101) Thursday 12:30pm — 1:20pm (RR101) Tutorial Hours: Tuesday 4:30pm — 5:20pm (RR101) Consultation Hours: Dr. Zhang: Mon. 3:30pm — 5:30pm; Thur. 2:30pm — 5:30pm Jason You: Fri. 1:30pm — 4:30pm (RR103 for make up tutorials) Course Syllabus: PDF Moodle@HKU: http://moodle.hku.hk/ RStudio Server: http://stat3612.saas.hku.hk:8787/ 
CLASS SCHEDULE:
Schedule  Lectures  Tutorials  Other Materials  
Lecture 1  Jan 16  Introduction to Data Science (PDF) 
Readings: 1. NYT 2009 Article (Link) 2. HBR 2012 Article (Link) 3. McKinseay 2011 Report (PDF) Drew Conway’s Venn Diagram 

Lecture 2  Jan 19,23  A Walk through Data Mining (HTML Rmd)  Tutorial 1: PDF  Rmd 
Supplementary Python Codes (Under Development) 
Lecture 3  Jan 26 
Data Exploration (HTML  Rmd) EDA, R::ggplot2, dyplr, Pipes 
More materials at Stat3622 Data Visualization 

Jan 28 – Feb 3  Class Suspension Period for Lunar New Year  
Lecture 4  Feb 6,9 
Regression (HTML  Rmd) LM, LSE, Model Inference Diagnostics, Variable Selection 
Tutorial 2: HTML  Rmd  
Lecture 5  Feb 1320  Basis Expansion (HTML  Rmd) Nonparametric regression, Basis Approach Feature representation, Splines, GAM 
Tutorial 3: PDF  Rmd  Assignment 1 (Due: Feb 28) 
Lecture 6  Feb 2027 
Regularization (HTML  Rmd) Smoothing spline, Ridge Regression, Lasso, Elastic Net, Sparse Modeling 
Tutorial 4: PDF  Rmd Tutorial 5: HTML  Rmd 

Lecture 7  Mar 2 
Model Assessment (HTML  Rmd) Error Analysis, Bias/Variance, Cross Validation, etc 
Assignment 2 (Due: Mar 15)  
Mar 6 – 11  Reading/Field Trip Week  
Lecture 8  Mar 1327  Classification (HTML  Rmd) Logistic/Softmax regression LDA/QDA, kNN … Confusion matrix, ROC curve 
Tutorial 6: PDF  Rmd Tutorial 7: HTML  Rmd Tutorial 8: HTML  Rmd 
March 16: No Class
March 20: Test 1 
Lecture 9  Mar 30 & Apr 3  Treebased Methods (HTML  Rmd) Tree for Regression and Classification Bagging, Random Forest, GBM 

Lecture 10  Apr 6,10 
Support Vector Machines (HTML  Rmd) Separating Hyperplane,, Maximal Margin Kernel trick, LIBSVM 
Assignment 3 (Due: Apr 24)  
Lecture 11  Apr 13, 20 
Neural Networks (HTML  Rmd) MLP, NNet, Backpropagation algorithm Deep Learning 
April 17: No Class  
Lecture 12  Apr 24  Unsupervised Learning (HTML  Rmd) PCA, Kmeans Clustering, etc 
April 27: Test 2  
May 4  Project Presentation 2pm–5pm  Written report due May 7 Template (PDF  Rmd) 