Received: 11 December 2016 Revised: 10 November 2017 Accepted: 13 November 2017
Fridge: Focused fine-tuning of ridge regression for
Kristoffer H. Hellton Nils Lid Hjort
Department of Mathematics, University of
Oslo, Oslo, Norway
Kristoffer H. Hellton, Department of
Mathematics, University of Oslo, PO Box
1053 Blindern, 0316 Oslo, Norway.
Norges Forskningsråd 235116
Statistical prediction methods typically require some form of fine-tuning of
tuning parameter(s), with K-fold cross-validation as the canonical procedure.
For ridge regression, there exist numerous procedures, but common for all,
including cross-validation, is that one single parameter is chosen for all future
predictions. We propose instead to calculate a unique tuning parameter for each
individual for which we wish to predict an outcome. This generates an individu-
alized prediction by focusing on the vector of covariates of a specific individual.
The focused ridge—fridge—procedure is introduced with a 2-part contribution:
First we define an oracle tuning parameter minimizing the mean squared pre-
diction error of a specific covariate vector, and then we propose to estimate this
tuning parameter by using plug-in estimates of the regression coefficients and
error variance parameter. The procedure is extended to logistic ridge regres-
sion by using parametric bootstrap. For high-dimensional data, we propose to
use ridge regression with cross-validation as the plug-in estimate, and simula-
tions show that fridge gives smaller average prediction error than ridge with
cross-validation for both simulated and real data. We illustrate the new concept
for both linear and logistic regression models in 2 applications of personal-
ized medicine: predicting individual risk and treatment response based on gene
expression data. The method is implemented in the R package fridge.
focused information criterion, genomics, personalized medicine, ridge regression, tuning
The development of inexpensive genomic technologies has greatly contributed to the field of personalized medicine,
by facilitating predictions of individualized treatment decisions and disease risks based on genetic characteristics.
Norway, for instance, the Norwegian Cancer Genomics Consortium (cancergenomics.no) has been founded to establish
“nationwide use of individual patient genetics to guide cancer treatment.” Genomic data are typically high-dimensional
with the number of variables, p, greatly exceeding the number of observations, n, and this high-dimensionality is often
handled by regularization or by constructing new, low-dimensional features.
Penalized linear regression, the most widely
used regularization technique, introduces some form of penalization of the regression coefficients in the linear model,
+ . Ridge regression imposes an L
penalty, penalizing the sum of squared regression coefficients:
1290 Copyright © 2018 John Wiley & Sons, Ltd. wileyonlinelibrary.com/journal/sim Statistics in Medicine. 2018;37: 1290–1303.