TY - JOUR AU - Savitsky, Terrance D AB - Abstract The US Bureau of Labor Statistics use monthly, by-state employment totals from the Current Population Survey (CPS) as a key input to develop employment estimates for counties within the states. The volatility of monthly CPS state totals, however, compromises the accuracy of resulting county estimates. Typically employed models for small area estimation produce denoised, state-level employment estimates by borrowing information over the survey months, but assume independence among the collection of by-state time series, which is typically violated due to similarities in their underlying economies. We construct Gaussian process and Gaussian Markov random field alternative functional prior specifications, each in a mixture of multivariate Gaussian distributions with a Dirichlet process (DP) mixing measure over the parameters of their covariance or precision matrices. Our DP mixture of functions models allow the data to simultaneously estimate a dependence among the months and between states. A feature of our models is that those functions assigned to the same cluster are drawn from a distribution with the same covariance parameters, so that they are similar, but do not have to be identical. We compare the performances of our two alternatives on synthetic data and apply them to recover denoised, by-state CPS employment totals for data from 2000–2013. 1. INTRODUCTION The Current Population Survey (CPS) is a household survey administered by the Census Bureau for the Bureau of Labor Statistics (BLS) that collects information on the employment status of the civilian noninstitutional population over sixteen years of age residing in the fifty states plus the District of Columbia. The CPS includes sixteen questions, which are administered in-person by a trained interviewer, to ascertain the nature of employment for each member of the household, under a panel survey design where an adult member of each household is interviewed in four successive months, then not interviewed for the following eight months, and included, again, for the following four months. In April 2017, there were 74,248 households in the sample 61,650 of these were eligible households, and of those, 52,996 were actually interviewed. This represents a response rate of approximately 86 percent. The CPS publishes monthly, state-indexed survey direct total employment estimates. The state total employment survey direct estimates are constructed from the sampled household responses in that state by using sampling weights that account for the complex sampling design under which the households were sampled. Our data are these time-indexed survey direct total employment estimates for each state. These direct estimates are volatile, so BLS applies independently specified regression models for each state to reduce the volatility before apportioning these estimates to labor market area or county as part of their local area unemployment survey (LAUS) program. This approach does not exploit the dependencies among the time series. State administrators frequently express concern that the resulting county-level estimates composed from the state-level regression models are overly volatile, in that they believe much of the month-over-month movement is noise, and not related to underlying economic drivers. The LAUS program seeks a more accurate (and less volatile) denoised time series for each state, by exploiting the dependence of CPS direct estimates across states. Bayesian nonparametric Gaussian processes (Vanhatalo, Riihimaki, Hartikainen, Jylanki, Tolvanen, et al. 2014) treat the collection of time series as sums of latent functions and a noise process. The model is focused on latent functions generated from a multivariate Gaussian distribution under covariance constructions that regulate smoothness properties of the resulting functions (Rasmusen and Williams 2006). Vanhatalo et al. (2014) specify a Gaussian process prior that is independent over the states. Dependence among a set of noisy functions over the states can be modeled by combining the functions into a vector of length equal to the number of states multiplied by the number of time-indexed measurement waves. Sampling the full joint posterior distribution would be computationally prohibitive, as computation time scales with order equal of the cube of the resulting vector length. Modeling the functions as a single vector is also not natural, because the ordering of the states may notably impact both global and local smoothness properties of the resulting vector, and this approach assumes a continuous transition between state functions. We offer an approach that builds on the independent nonparametric Bayesian functional models by probabilistically tying the set of latent functions together under a marginal nonparametric mixture prior, where the mixing measure is assigned a Dirichlet process (DP) prior (Ferguson 1973). Our modeling framework is similar to Gelfand, Kottas, and MacEachern (2005), but their approach imposes the Dirichlet process prior directly on the functions, so that co-clustered functions must be exactly the same. This is too restrictive for our application. Instead, impose the DP prior on the covariance matrix parameters of the generating Gaussian distribution, such that state functions which are co-clustered are drawn from a Gaussian distribution with the same parameters, but are not required to be exactly equal. Our proposed approach allows information in the data to extract signal from noise in the time series for each state. We formulate our nonparametric mixture approach using two alternative prior constructions to parametrize the covariance matrix of a multivariate Gaussian prior imposed on the set of latent functions; first, a Gaussian process prior; and second, an intrinsic Gaussian Markov random field prior. These alternative formulations are developed and their properties contrasted in section 2. Schemes to sample the posterior distributions under each of these alternatives are sketched and discussed in section 3. We compare their performance, both in fitting the functions and uncovering a clustering structure used to generate the functions, in section 4. Our nonparametric mixture models are applied for estimation of state-level employment totals obtained from the CPS in section 5, followed by a concluding discussion in section 6. 2. MODELS FOR FUNCTIONAL DATA 2.1 CPS Employment Count Data Our data is composed of T=158 months of (sampling weighted) direct estimates of employment totals for each of N = 51 states (including the District of Columbia) based on the Current Population Survey. The employment totals are influenced by the underlying economic conditions of the states. Performing inference on clusters of distributions generating the set of state-indexed functions provides a context for reported employment and unemployment levels, as we show later. We denote the set of (T=158)×1 vectors of survey direct estimates for a collection of N = 51 states as {yi}i=1,…,N ⁠. We estimate the dependence through a prior construction that permits a grouping or clustering of covariance (under a Gaussian process [GP] prior specification) or precision (under an intrinsic Gaussian Markov random field [iGMRF] prior specification) parameters, that we index by state. These state-indexed covariance or precision parameters are used to estimate denoised, smooth functions {fi} ⁠, from the {yi} ⁠. We first outline GP and iGMRF formulations with simpler, global covariance and precision parameters, respectively, (θ,κ) ⁠, not indexed by state, to illustrate the properties of functions generated under these prior specifications. Employing a single set of covariance or precision parameters assumes the functions are drawn exchangeably. We subsequently extend both of these models using nonparametric mixture formulations over the states for {θi} and {κi} under the GP and iGMRF models, respectively, to allow for dependence among the functions. 2.2 Gaussian Process Model Our Gaussian process (GP) model is parameterized through the covariance matrix, C(θ) ⁠, where θ are estimated by the data and control the trend, length scale (or wavelength), and variation in generated functions, {fi} ⁠. We specify a GP formulation in the following probability model: yiT×1∼indNT(fi,τɛ−1IT), i=1,…,N(1a) fi∼iidNT(0,C(θ)T×T)(1b) θ1,…,θP|a,b∼iidGa(a=1.0,b=1.0)(1c) τɛ|c,d∼iidGa(c=1.0,d=1.0),(1d) where θ=(θ1,…,θP) ⁠. The GP model specifies a formula for the covariance matrix, C(θ) ⁠, with C(θ)=(Cfj,fℓ)j,ℓ∈(1,…,T) Cfj,fℓ=1θ1(1+(tj−tℓ)2θ2θ3)−θ3, for P = 3. This is known as the rational quadratic formula, which may be derived as a mixture over the length scale parameter of more commonly used squared exponential kernels, 1/θ1 exp ⁡((tj−tℓ)2/θ) ⁠, with the inverse of the length scale parameter, θ−1 ⁠, which controls function periodicity, distributed as a Beta distribution with hyperparameters (θ3,θ2−1) (Rasmusen and Williams 2006). The vertical magnitude of a function drawn under the rational quadratic covariance formula is controlled by θ1, while θ2 controls the mean length scale or period of the function, and θ3 controls smooth deviations from θ2. As θ3↑∞ ⁠, this formulation converges to a single squared exponential kernel with length-scale, θ2. So the T × T covariance matrix C(θ) is parameterized through the rational quadratic covariance formula, specified with parameters, θ=(θ1,θ2,θ3) ⁠. Figure 1 displays a randomly generated function under each of 3 distinct value settings for θ=(θ1,θ2,θ3) to provide insight into how θ produces distinct features in f. Our choice of the rational quadratic covariance formulation is intended as a parsimonious specification of a covariance matrix that is able to discover multiple length scales or wavelengths in a function, in lieu of using a set of squared exponential covariance terms (e.g., in an additive formulation), with each term specialized to a particular length scale. For our CPS data, θ2 and θ3 captures economic shocks that are important to capture as features rather than noise. Figure 1. Open in new tabDownload slide RandomlyDrawn Functions, f, from a GP Using a Rational Quadratic Covariance Function Under Distinct Values for θ=(θ1,θ2,θ3) ⁠. The top panel employs θ=(0.4,0.2,5.0) to generate a “rough” surface, while the bottom panel specifies (0.4,4.0,5.0) to characterize a “smooth” surface, and the middle panel employs (0.4,0.2,0.3) to produce a rough surface characterized by a high-degree of length-scale variation. The rational quadratic formula also produces smooth surfaces, {fi} ⁠, that are constrained to be differentiable at all orders and separate signal from noise. The GP of (1) under the rational quadratic covariance formula is very flexible, allowing the estimation of functions, {fi} ⁠, to be non-linear of any order within the space of smooth functions. This GP is equivalent to an infinite dimension polynomial spline, even though the parameterization of the GP is through the covariance matrix, rather than the mean. A linear mean model may be re-parameterized as a zero mean GP by marginalizing over the regression coefficients, which demonstrates that linear surfaces are a subset of the space of smooth functions parameterized in (1) (Neal 2000b). The {yi} are assumed to conditionally independent, given {fi} ⁠. Standard errors for the monthly CPS direct estimates were not available to us. We treat the precision parameter, τɛ ⁠, as unknown and assign it a Gamma prior in our model, which remains fully identified. 2.3 Intrinsic Gaussian Markov Random Field Model An iGMRF prior is typically assigned to the first difference, Δfij=fi,j+1−fi,j∼iidN(0,κ−1) and may be viewed as a probabilistic local smoother indexed by time. Here κ is a precision parameter, analogous to the θ1 precision parameter under the GP formulation. This approach uses nearest neighbors, (fi,j−1,fi,j+1) ⁠, to encode the time-indexed dependence structure among the fij,j=1,…,T for each domain, i∈(1,…,N) ⁠. Modeling first differences may produce an excessively rough (though continuous) surface that over-fits the data, so we employ a prior construction based on the second difference approximation to the first derivative, Δ2fij=Δ(Δfij)∼iidN(0,κ−1) ⁠. This prior on second differences produces the joint distribution, fi∼iidκT−22 exp ⁡(−κ2∑j=1T−2(fij−2fi,j+1+fi,j+2))=κT−22 exp ⁡(−12fi′Rfi), where R=κQ specifies a band-diagonal precision matrix with non-zero entries, (Qj,j−2,Qj,j−1,Qj,j,Qj,j+1,Qj,j+2)=(1,−4,6,−4,1) for 2