STAT3612 Data Mining

HKU 2017-18 Semester 2

Course Syllabus: PDF

Instructor: Dr. Aijun Zhang (ajzhang at hku dot hk; RR224)
Tutors:
1. Dr. Simon K.C. Cheung (simonkc at hku dot hk; RR234)
2. Mr. Zebin Yang (u3005497 at connect dot hku dot hk; RR114)
Lecture Hours:
Tuesday 3:30pm — 4:20pm (RR101)
Friday 3:30pm — 5:20pm (RR101)
Tutorial Hours:
Session 1 (Dr. Simon Cheung): Tuesday 4:30pm — 5:20pm (RR101)
Session 2 (Mr. Zebin Yang): Thursday 3:30pm — 4:20pm (RR101)
Consultation Hours:
TBD

Moodle@HKU: http://moodle.hku.hk/

RStudio Server: http://stat3612.saas.hku.hk:8787/

Past-year Materials: Spring 2017

DataScienceVennDiagram
Class Schedule Lecture Notes Tutorials
Lecture 1 Jan 19 Introduction to Data Science (HTML | Rmd)
Big Data, DS job, DS Venn Diagram, DS Workflow
R Studio and R Markdown
Tutorial 1 (HTML | Rmd)
Lecture 2 Jan 23-26 Data Exploration (HTML | Rmd)
Data Manipulation, R:dyplr,
EDA, Data Visualization, R:ggplot2, Pipes
Tutorial 2 (HTML | Rmd)
Lecture 3 Jan 30 Machine Learning (HTML | Rmd)
Stat3612 Landscape
Lecture 4 Feb 2 Linear Regression (HTML | Rmd)
LM, LSE, Model Inference
Diagnostics, Variable Selection
Tutorial 3 (HTML | Rmd)
Lecture 5 Feb 6-9 Basis Expansion (HTML | Rmd)
Nonparametric regression, Feature representation,
Piecewise linear fitting, Regression Splines, GAM
Tutorial 4 (HTML | Rmd)
Lecture 6 Feb 13-23 Regularization (HTML | Rmd)
Smoothing spline, Ridge Regression,
Lasso, Elastic Net, Sparse Modeling
Tutorial 5 (HTML | Rmd)
Lecture 7 March 2-13 Classification (HTML | Rmd)
Logistic/Softmax regression
LDA/QDA, kNN, Confusion matrix, ROC curve
Tutorial 6 (HTML | Rmd)
Assignment 1 (Due: March 15)
Dataset: BitCoin.csv
Lecture 8 March 20-23 Tree-based Methods (HTML | Rmd)
Classification and Regression Trees
Bagging, Random Forest, GBM
Tutorial 7 (HTML | Rmd)

Test 1 on April 3

Lecture 9 March 27 Support Vector Machines (HTML | Rmd)
Separating Hyperplane, Maximal Margin, Kernel trick
LIBSVM, Hyperparameter optimization
Tutorial 8 (HTML | Rmd)
Assignment 2 (Due: April 11)

Group Project Announcement
x_train.csv | y_train.csv
x_test.csv | Leader Board

Lecture 10 April 6-13 Neural Networks (HTML | Rmd)
MLP, NNet, Backpropagation algorithm
Introduction to Deep Learning
Tutorial 9 (HTML | Rmd)
Tutorial 10 (HTML | Rmd)
Lecture 11 April 17-20 Unsupervised Learning (HTML | Rmd)
Clustering, K-means, PCA, SVD
Matrix factorization, Sparse coding
Tutorial 11 (HTML | Rmd)
Lecture 12 April 24 MNIST Case Study (HTML | Rmd)
Large-scale logistic modeling, ANN, CNN
Tutorial 12 (HTML | Rmd)
Assignment 3 (Due: May 6)

Test 2 on April 27

May 4 Project Presentation 2pm–5pm Written report due May 6
Template (PDF | Rmd)