Ann Oper Res (2018) 263:385–404
https://doi.org/10.1007/s10479-016-2173-9
S.I.: DATA MINING AND ANALYTICS
Recovering all generalized order-preserving submatrices:
new exact formulations and algorithms
Andrew C. Trapp
1
· Chao Li
2
· Patrick Flaherty
3
Published online: 25 March 2016
© Springer Science+Business Media New York 2016
Abstract Cluster analysis of gene expression data is a popular and successful way of elu-
cidating underlying biological processes. Typically, cluster analysis methods seek to group
genes that are differentially expressed across experimental conditions. However, real bio-
logical processes often involve only a subset of genes and are activated in only a subset of
environmental or temporal conditions. To address this limitation, Ben-Dor et al. (J Comput
Biol 10(3–4):373–384, 2003) developed an approach to identify order-preserving subma-
trices (OPSMs) in which the expression levels of included genes induce the sample linear
ordering of experiments. In addition to gene expression analysis, OPSMs have application to
recommender systems and target marketing. While the problem of finding the largest OPSM
is
NP
-hard, there have been significant advances in both exact and approximate algorithms
in recent years. Building upon these developments, we provide two exact mathematical pro-
gramming formulations that generalize the OPSM formulation by allowing for the reverse
linear ordering, known as the generalized OPSM pattern, or GOPSM. Our formulations
incorporate a constraint that provides a margin of safety against detecting spurious GOPSMs.
Finally, we provide two novel algorithms to recover, for any given level of significance, all
GOPSMs from a given data matrix, by iteratively solving mathematical programming formu-
lations to global optimality. We demonstrate the computational performance and accuracy of
our algorithms on real gene expression data sets showing the capability of our developments.
Keywords Order-preserving submatrix · Integer programming · Data mining · Biclustering
B
Andrew C. Trapp
atrapp@wpi.edu
1
Foisie School of Business, Worcester Polytechnic Institute, 100 Institute Rd., Worcester, MA
01609, USA
2
Department of Computer Science, Worcester Polytechnic Institute, 100 Institute Rd., Worcester,
MA 01609, USA
3
Department of Mathematics and Statistics, University of Massachusetts, Amherst, MA 01003, USA
123