Qual Quant (2014) 48:3185–3193
Ridge regression and its degrees of freedom
Theo K. Dijkstra
Published online: 25 October 2013
© Springer Science+Business Media Dordrecht 2013
Abstract For ridge regression the degrees of freedom are commonly calculated by the trace
of the matrix that transforms the vector of observationson the dependent variable into the ridge
regression estimate of its expected value. For a ﬁxed ridge parameter this is unobjectionable.
When the ridge parameter is optimized on the same data, by minimization of the generalized
cross validation criterion or Mallows C
, additional degrees of freedom are used however.
We give formulae that take this into account. This allows of a proper assessment of ridge
regression in competitions for the best predictor.
Keywords Ridge regression · Degrees of freedom · Prediction ·
Cross-validation · Stein’s identity
Ridge regression is one of the oldest ‘shrinkage’ or ‘penalized smoothing’ methods in regres-
sion analysis. It imposes a penalty on the squared Euclidean norm of the regression coefﬁ-
cients in an attempt to counterbalance OLS’ tendency to overadapt to the idiosyncrasies of
the data at hand, a tendency that typically leads to a shrinkage of the R
for prediction relative
to the R
for ﬁtting. Ridge regression belongs to a family of methods that includes the Lasso,
the Elastic net, and the Dantzig selector. See Berk (2008), Hastie et al. (2009)orIzenman
(2008) for an overview of these methods and various motivations. McDonald (2009, 2010)
gives a recent overview focussed on ridge regression, rich in algebraic insights.
The starting point of ridge regression is a n ×
k + 1
where y = μ + ε
with ﬁxed but unknown μ and ε ∼
, not necessarily Gaussian, and where the (ﬁxed)
design matrix X is of order n × k.Weneednot assume that μ is a linear combination of the
columns of X, nor that X has rank k.
T. K. Dijkstra (
Faculty of Economics & Business, University of Groningen, PO Box 800, 9700 AV Groningen,