STAT3612 Data Mining

HKU 2016-17 Semester 2

Instructor: Dr. Aijun Zhang (ajzhang at hku dot hk; RR224)
Tutor: Mr. Jia You (Jason) (u3005315 at hku dot hk; RR116)
Lecture Hours:
Monday 12:30pm — 2:20pm (RR101)
Thursday 12:30pm — 1:20pm (RR101)
Tutorial Hours:
Tuesday 4:30pm — 5:20pm (RR101)
Consultation Hours:
Dr. Zhang: Mon. 3:30pm — 5:30pm; Thur. 2:30pm — 5:30pm
Jason You: Fri. 1:30pm — 4:30pm (RR103 for make up tutorials)

Course Syllabus: PDF

Moodle@HKU: http://moodle.hku.hk/

RStudio Server: http://stat3612.saas.hku.hk:8787/

DataScienceVennDiagram
CLASS SCHEDULE:
Schedule Lectures Tutorials Other Materials
Lecture 1 Jan 16 Introduction to Data Science (PDF) Readings:
1. NYT 2009 Article (Link)
2. HBR 2012 Article (Link)
3. McKinseay 2011 Report (PDF)
Drew Conway’s Venn Diagram
Lecture 2 Jan 19,23 A Walk through Data Mining (HTML| Rmd) Tutorial 1: PDF | Rmd Supplementary Python Codes
(Under Development)
Lecture 3 Jan 26 Data Exploration (HTML | Rmd)
EDA, R::ggplot2, dyplr, Pipes
More materials at
Stat3622 Data Visualization
Jan 28 – Feb 3 Class Suspension Period for Lunar New Year
Lecture 4 Feb 6,9 Regression (HTML | Rmd)
LM, LSE, Model Inference
Diagnostics, Variable Selection
Tutorial 2: HTML | Rmd
Lecture 5 Feb 13-20 Basis Expansion (HTML | Rmd)
Nonparametric regression, Basis Approach
Feature representation, Splines, GAM
Tutorial 3: PDF | Rmd Assignment 1 (Due: Feb 28)
Lecture 6 Feb 20-27 Regularization (HTML | Rmd)
Smoothing spline, Ridge Regression,
Lasso, Elastic Net, Sparse Modeling
Tutorial 4: PDF | Rmd
Tutorial 5: HTML | Rmd
Lecture 7 Mar 2 Model Assessment (HTML | Rmd)
Error Analysis, Bias/Variance,
Cross Validation, etc
Assignment 2 (Due: Mar 15)
Mar 6 – 11 Reading/Field Trip Week
Lecture 8 Mar 13-27 Classification (HTML | Rmd)
Logistic/Softmax regression
LDA/QDA, kNN …
Confusion matrix, ROC curve
Tutorial 6: PDF | Rmd
Tutorial 7: HTML | Rmd
Tutorial 8: HTML | Rmd
March 16: No Class

March 20: Test 1

Project Announcement
DataTrain | DataTest
Leader Board

Lecture 9 Mar 30 & Apr 3 Tree-based Methods (HTML | Rmd)
Tree for Regression and Classification
Bagging, Random Forest, GBM
Lecture 10 Apr 6,10 Support Vector Machines (HTML | Rmd)
Separating Hyperplane,, Maximal Margin
Kernel trick, LIBSVM
Assignment 3 (Due: Apr 24)
Lecture 11 Apr 13, 20 Neural Networks (HTML | Rmd)
MLP, NNet, Backpropagation algorithm
Deep Learning
April 17: No Class
Lecture 12 Apr 24 Unsupervised Learning (HTML | Rmd)
PCA, K-means Clustering, etc
April 27: Test 2
May 4 Project Presentation 2pm–5pm Written report due May 7
Template (PDF | Rmd)