What’s covered in this lecture?


1 Nonparametric Regression

Basis expansion is a simple extension of linear models to model nonlinearity. It is to transform the predictors (x-variables) with nonlinear representations, in another word, replace \(x\) in the model with \(\phi_1(x), \phi_2(x), \ldots\) for some chosen basis functions.

This is a common approach to nonparametric regression:

\[ f(x)\approx \sum_{j=1}^p \beta_j \phi_j(x) = \boldsymbol\phi(x)^T\boldsymbol{\beta} \] Then, estimate the parameter \(\boldsymbol{\beta}\) by \[ \min_{\boldsymbol\beta}\sum_{i=1}^n\big[y_i -\boldsymbol\phi(x)^T\boldsymbol\beta\big]^2 = (\mathbf{y}-\boldsymbol\Phi\boldsymbol\beta)^T(\mathbf{y}-\boldsymbol\Phi\boldsymbol\beta) \] and result in the least squares estimate \(\boldsymbol{\hat\beta} = (\boldsymbol\Phi^T\boldsymbol\Phi)^{-1}\boldsymbol\Phi^T\mathbf{y}\). This becomes the linear regression setting.

1.1 Polynomial Basis

Consider the simluated example of curve fitting (as in Lecture 2) with polynomial basis functions:

ffun = function(x) exp(-(x-3)^2)
x = seq(0.1,4,by=0.1)
set.seed(2017)
y = ffun(x) + 0.1*rnorm(length(x))

par(mfrow=c(2,2))
matplot(x, cbind(y,ffun(x)), ylab = "y",
        type="pl", pch=19, lty=1, lwd=2)

fit1 = lm(y ~ poly(x,3))   
fit2 = lm(y ~ poly(x,10))   
matplot(x, cbind(y, fit1$fit, fit2$fit), ylab = "y",
        type="pll", pch=19, lty=1, col=c(1,3,4), lwd=2)
legend("topleft", c("poly(x,3)", "poly(x,10)"), lty=1, col=c(3,4), lwd=2)

matplot(x, poly(x, 3), xlab="x",type="l", lty=1, lwd=2, 
        main="Polynomial Basis (order 3)")
matplot(x, poly(x, 10), xlab="x",type="l", lty=1, lwd=2, 
        main="Polynomial Basis (order 10)")

  • Use of Orthogonal Polynomials, rather than using the monomial basis \(1, x, x^2, \cdots\).
  • Drawback: usually fit poorly near the endpoints.

1.2 Fourier Basis

Suitable to fit periodic waves of the form \(f(x) = c_0 + \sum_{k} A_k \sin(\omega_kx+\phi_k)\) with amplitudes \(A_k\), frequencies \(\omega_k\) and phase \(\phi_k\). Note that the period \(T=2\pi/\omega\).

ffun2 = function(x) 0.5*sin(3*x) + 0.25*sin(10*x)
x = seq(0,4,by=0.02)
set.seed(2017)
y = ffun2(x) + 0.1*rnorm(length(x))
par(mfrow=c(1,2))
matplot(x, cbind(ffun2(x), 0.5*sin(3*x), 0.25*sin(10*x)), ylab="y",
        type="l", lty=1, lwd=c(2,1,1));  
abline(h=0,lty=3)
plot(x, y, type="l");  abline(h=0,lty=3)

Fourier basis for periodic functions on \(x\in T\) is defined as \[ \phi_0(x) = \frac{1}{\sqrt{|T|}}, \ \phi_{2k-1}(x) = \frac{\sin(k\omega x)}{\sqrt{|T|/2}}, \ \phi_{2k}(x) = \frac{\cos(k\omega x)}{\sqrt{|T|/2}}, k=1,\ldots,K \] which can be easily constructed. You may also use R::fda package with create.fourier.basis and eval.basis functions.

BasisFourier = function(x, nfreq=2) {
  T = diff(range(x))
  omega = 2*pi/T
  xphi = matrix(NA, length(x), 2*nfreq + 1)
  xphi[,1] = 1/sqrt(T)
  xname  = "cons"
  for (j in 1:nfreq) {
    xphi[,2*j] = sin(j*omega*x)/sqrt(T/2)
    xphi[,2*j+1] = cos(j*omega*x)/sqrt(T/2)
    xname=c(xname, paste("sin", j, sep=""), paste("cos", j, sep=""))
  }
  colnames(xphi) = xname
  return(xphi)
} 

par(mfrow=c(2,2))
nfreq = 2
xphi = BasisFourier(x,nfreq)
matplot(x, xphi, type="l", lty=1,
        main="Fourier basis (nfreq = 2)", ylab="")
legend("bottomleft", colnames(xphi), lty=1, col=1:dim(xphi)[2])
fit = lm(y ~ xphi-1)
matplot(x, cbind(y,fit$fit), ylab = "y",
        type="l",  lty=1, col=c(1,4), lwd=c(1,2))
nfreq=7
xphi = BasisFourier(x,nfreq)
matplot(x, xphi[,2*(1:nfreq)], xlab="x", 
        type="l", lty=1, lwd=1,
        main="Fourier basis (nfreq = 7)", ylab="")
legend("bottomleft", colnames(xphi)[2*(1:nfreq)], lty=1, col=1:dim(xphi)[2])
fit = lm(y ~ xphi-1)
matplot(x, cbind(y,fit$fit), ylab = "y",
        type="l",  lty=1, col=c(1,4), lwd=c(1,2))