2006-01-20 princomp, ridge, PLS

Document technical information

Format doc
Size 209.4 kB
First found May 22, 2018

Document content analysis

Category Also themed
Language
English
Type
not defined
Concepts
no text concepts found

Persons

Organizations

Places

Transcript

2005-03-30
Supplemental notes,
BIOINF 2054/BIOSTAT 2018
Statistical Foundations for Bioinformatics Data Mining
Target readings:
Hastie, Tibshirani, Friedman
Chapter 3:
Linear regression, principal components, ridge regression, partial least squares
No classes on Jan 25, 27..
Gauss-Markov Theorem:
The “best linear unbiased estimator” minimizes RSS.
To estimate a linear combination =aTβ with an unbiased
linear combination cTy, choose
ˆ  a T ˆ
But, by accepting some bias, you can do much better.
Best subset selection: Note that selecting the “best
subset” is NOT a linear estimator. Why not??
-1-
2005-03-30
Supplemental notes,
BIOINF 2054/BIOSTAT 2018
Statistical Foundations for Bioinformatics Data Mining
Ridge Regression See Fig 3.7.
The Principle: Add a “ridge” of size
XTX, to stabilize the matrix inverse.

to the diagonal of
ˆridge  ( X T X   I )1 X T Y
Another view: penalized likelihood
ˆridge : arg min  (Y  X  )T (Y  X  )   ||  ||2 
This can also be thought of as maximizing a Bayesian
posterior, where the prior is [ ] ~ N (0,(2 )1 I p ) .
This is also an example of data augmentation :
Let X aug
X


Y 

 , Y aug    . Then OLS will yield
0
 diag (  ) 
Another view:
Solve a constrained optimization problem
(restricting the model space):
ˆ : arg min  (Y  X  )T (Y  X  )
restricted to the set { : ||  ||2  K }
Note error in Fig 3.12:
if
X T X  I p , then as in Table 3.4, ˆ
ridge
So the ellipse’s major or minor axis must go through the origin.
-2-

1 ˆ ,
ls
1 
ˆridge .
2005-03-30
BIOINF 2054/BIOSTAT 2018
Supplemental notes,
Statistical Foundations for Bioinformatics Data Mining
The singular value decomposition of X (svd( ))
X  UDV T
is
where
U is N by p, U T U  I p
and UU T  X ( X T X )1 X T  H (“hat” matrix).
U transforms data points in “scatterplot space”
p
(rows of X in R ), creating a new dataset U T X  DV T
V is p by p, V TV  VV T  I p ,
V rotates data points in “variable space” (columns of X in R n ),
defining new variables XV  UD .
D is diagonal, d1  ...  d p  0 are the singular values.
[What are the eigenvalues of X T X ?]
Then
X ˆls  Yˆ  HY  UU T Y .
Ridge Regression: degrees of freedom (equation 3.47)
First, note that
UU 
  U i1 j U T 
p
T
i1i2
j 1
ji2
(definition of matrix multiplication)
p
  U i1 jU i2 j
j 1
   u j u Tj 
p
j 1
i1 i2
(outer product of column j with itself)
p
Therefore UU   u j u Tj .
T
j 1
-3-
2005-03-30
Supplemental notes,
BIOINF 2054/BIOSTAT 2018
Statistical Foundations for Bioinformatics Data Mining
Similarly,
U diag(a)U 
  U i1 j a j U T 
p
T
i1i2
j 1
ji2
p
  a jU i1 jU i2 j
j 1
  a j  u j u Tj 
p
j 1
i1 i2
p
Therefore U diag(a)U   a j u j u Tj (regarding a j as a scalar multiplier)
T
j 1
p
  u j a j u Tj
j 1
In 3.47,
(regarding a j as a 1  1 matrix)
diag(a)  D( D   I ) D,
2
1
aj 
d j2
d j2   .
We conclude:
p
X ˆridge  UD( D 2   I )1 DU T Y   u j
j 1
Recall that
d j2
uTj Y .
dj 
2
X ˆls  UU T Y .
For a linear smooth Yˆ  X ˆ  SY ,
the effective degrees of freedom are (is?)
df  tr ( S )  S11  ...  S pp  sum(diag( S ))
p
So for ridge regression df ( )   d
j 1
d j2
2
j
and for least squares, df  df (0)  p .
-4-
 ,
(see 5.4.1, 7.6).
2005-03-30
Supplemental notes,
BIOINF 2054/BIOSTAT 2018
Statistical Foundations for Bioinformatics Data Mining
Lasso Regression
See Fig 3.9.
ˆ : arg min  (Y  X  )T (Y  X  )   ||  ||
Note the FIRST power of the length in the penalty function.
Another view:
Solve a constrained optimization problem
(restricting the model space):
ˆ : arg min  (Y  X  )T (Y  X  )
restricted to the set { : ||  || K }
Be prepared to compare ridge regression to lasso regression.
See Fig 3.12 and 3.13.
-5-
2005-03-30
Supplemental notes,
BIOINF 2054/BIOSTAT 2018
Statistical Foundations for Bioinformatics Data Mining
Principal Components
Recall:
See Fig 3.10.
X  UDV T .
The principal component weights
are the columns of V, v1...v p .
X T XV  VDU TUDV TV  VD 2 ,
so X T Xv j  d 2j v j (the v j are eigenvectors).
The principal components
are the linear combinations
z j  Xv j , j  1,..., p .
Note that Z
( z1...z p )  XV  UD.
This is a derived covariate technique. Z replaces X.
Algorithm for generating principal components:
The successive principal components solve
v j  arg max Var ( X  ) 
over all α of length 1 and orthogonal to v1 ,..., v j 1 .
(Important: note that Y does not enter into this.)
See Fig 3.8.
-6-
2005-03-30
Supplemental notes,
BIOINF 2054/BIOSTAT 2018
Statistical Foundations for Bioinformatics Data Mining
Principal components regression is the model
yˆ
M
pcr
 y  ˆ j z j , so
j 1
ˆ
M
pcr
 ˆ j v j ,
j 1
Note that M, the number of components to include, is a
model complexity tuning parameter.
-7-
2005-03-30
Supplemental notes,
BIOINF 2054/BIOSTAT 2018
Statistical Foundations for Bioinformatics Data Mining
Partial Least Squares
The successive PLS components solve
ˆ j  arg max Cov(Y , X  )
over all α of length 1 and orthogonal to 1 ,..., j 1 .
This is the same as
ˆ j  arg max corr 2 (Y , X  )Var ( X  )
Contrast this with principal components, where Y plays no role.
PLS regression is the model
yˆ
M
pls
 y  ˆ j z j , where z j  X ˆ j ,
so
j 1
M
ˆ pls  ˆ,jˆ j
j 1
(depends on M, a “smoothing” or “model complexity” parameter).
This is another derived covariates method.
-8-
2005-03-30
Supplemental notes,
BIOINF 2054/BIOSTAT 2018
Statistical Foundations for Bioinformatics Data Mining
Comparing the methods:
See Fig 3.6, 3.11, Table 3.3.
Summary: What do you need to remember about these methods? List here.
Exercises due Feb 1:
Go to http://www-stat.stanford.edu/~tibs/ElemStatLearn/ . Obtain the
prostate cancer data set. Load it into R. Carry out OLS regression, ridge
regression, principle components regression, and partial least squares
regression.
Also do exercises 3.1- 3.7 (skip 3.3b), 3.9, 3.11, 3.17.
As usual, bring to class at least one AHA and one Question about Chapter 3.
You will read Ch. 4 .1 – 4.3 for Friday Feb 3.
-9-
×

Report this document