Testing moderator hypotheses in meta-analytic structural equation modeling using subgroup analysis

Testing moderator hypotheses in meta-analytic structural equation modeling using subgroup analysis Meta-analytic structural equation modeling (MASEM) is a statistical technique to pool correlation matrices and test structural equation models on the pooled correlation matrix. In Stage 1 of MASEM, correlation matrices from independent studies are combined to obtain a pooled correlation matrix, using fixed- or random-effects analysis. In Stage 2, a structural model is fitted to the pooled correlation matrix. Researchers applying MASEM may have hypotheses about how certain model parameters will differ across subgroups of studies. These moderator hypotheses are often addressed using suboptimal methods. The aim of the current article is to provide guidance and examples on how to test hypotheses about group differences in specific model parameters in MASEM. We illustrate the procedure using both fixed- and random-effects subgroup analysis with two real datasets. In addition, we present a small simulation study to evaluate the effect of the number of studies per subgroup on convergence problems. All data and the R-scripts for the examples are provided online. Keywords Meta-analytic structural equation modeling · Two-stage structural equation modeling · Meta-analysis · Random-effects model · Subgroup analysis The combination of meta-analysis and structural equation a pooled correlation matrix with a random- or fixed-effects modeling (SEM) for the purpose of testing hypothesized model. In the second stage of the analysis, a structural models is called meta-analytic structural equation modeling equation model is fitted to this pooled correlation matrix. (MASEM). Using MASEM, correlation matrices from Several alternative models may be tested and compared in independent studies can be used to test a hypothesized this stage. If all variables were measured on a common model that explains the relationships between a set of scale across studies, analysis of covariance matrices would variables or to compare several alternative models that may also be possible (Cheung & Chan, 2009). This would allow be supported by different studies or theories (Viswesvaran researchers to study measurement invariance across studies. &Ones, 1995). The state-of-the-art approach to conducting In this paper we focus on correlation matrices although MASEM is the two-stage SEM (TSSEM) approach the techniques that are discussed are directly applicable to (Cheung, 2014; Cheung & Chan, 2005b). In the first stage covariance matrices. of the analysis, correlation matrices are combined to form Researchers often have hypotheses about how certain parameters might differ across subgroups of studies (e.g., Suzanne Jak was supported by Rubicon grant 446-14-003 from Rosenbusch, Rauch, and Bausch (2013)). However, there the Netherlands Organization for Scientific Research (NWO). are currently no straightforward procedures to test these Mike W.-L. Cheung was supported by the Academic Research hypotheses in MASEM. The aims of the current article are Fund Tier 1 (FY2013-FRC5-002) from the Ministry of Education, therefore: 1) to provide guidance and examples on how to Singapore. test hypotheses about group differences in specific model Suzanne Jak parameters in MASEM; 2) to discuss issues with regard S.Jak@uva.nl to testing differences between subgroups based on pooled 1 correlation matrices; and 3) to show how the subgroup Methods and Statistics, Child Development and Education, University of Amsterdam, Nieuwe Achtergracht 127, 1018 models with equality constraints on some parameters can WS, Amsterdam, The Netherlands be fitted using the metaSEM (Cheung, 2015b)and OpenMx National University of Singapore, Singapore, Singapore packages (Boker et al., 2014) in R (R Core Team, 2017). 1360 Behav Res (2018) 50:1359–1373 Specifically, we propose a follow-up analysis in which One way to account for heterogeneity is by estimat- the equality of structural parameters across studies can be ing between-study heterogeneity across all studies in the tested. Assuming that there are hypotheses on categorical random-effects approach (Case 2 in Table 1). By using a study-level variables, the equality of specific parameters random-effects model, the between-study heterogeneity is can be tested across subgroups of studies. In this way, accounted for at Stage 1 of the analysis (pooling correla- it is possible to find a model in which some parameters tions), and the Stage 2 model (the actual structural model are equal across subgroups of studies and others are not. of interest) is fitted on the averaged correlation matrix. More importantly, it helps researchers to identify how study- Under the random-effects model, study-level variability is level characteristics can be used to explain differences in considered a nuisance. An overall random-effects analysis parameter estimates. may be the preferred choice when moderation of the effects by study-level characteristics is not of substantive interest (Cheung & Cheung, 2016). Methods to model heterogeneity Subgroup analysis is more appropriate than overall in meta-analysis random-effects analysis in cases where it is of interest to determine how the structural models differ across levels With regard to how to handle heterogeneity in a meta- of a categorical study-level variable, (Cases 3 and 4 in analysis, two dimensions (or approaches) can be distin- Table 1). In a subgroup analysis, the structural model is guished (e.g., Borenstein, Hedges, Higgins, and Rothstein fitted separately to groups of studies. Within the subgroups, (2009)). The first dimension concerns whether to apply a one may use random- or fixed-effects modeling (Jak, 2015). fixed- or a random-effects model, while the second dimen- Fixed-effects subgroup analysis is suitable if homogeneity sion is about whether or not to include study-level mod- of correlations within the subgroups is realistic. Most often, erators. Two classes of models can be differentiated: the however, heterogeneity within subgroups of studies is still fixed-effects model and the random-effects model. The expected, and fixed-effects modeling may be unrealistic. fixed-effects model allows conditional inference, meaning In such cases, random-effects subgroup analysis may be that the results are only relevant to the studies included the best choice. A possible problem with a random-effects in the meta-analysis. The random-effects model allows for subgroup analysis is that the number of studies within each unconditional inference to studies that could have been subgroup may become too small for reliable results to be included in the meta-analysis by assuming that the included obtained. studies are samples of a larger population of studies (Hedges We focus on the situation in which researchers have & Vevea, 1998). an apriori idea of which study-level characteristics may The fixed-effects model (without moderators) usually moderate effects in the Stage 2 model. That is, we do assumes that all studies share the same population effect not consider exploratory approaches, such as using cluster size, while the fixed-effects model with moderators assumes analysis to find homogeneous subgroups of studies (Cheung that the effects are homogeneous after taking into account & Chan, 2005a). the influence of moderators. The random-effects model Besides the random-effects model and subgroup anal- assumes that the differences across studies are random. The ysis, Cheung and Cheung (2016) discuss an alternative random-effects model with moderators, known as a mixed- approach to addressing heterogeneity in MASEM, called effects model, assumes that there will still be random effects “parameter-based MASEM”. Since this approach also has after the moderators are taken into account. its limitations, and discussing them is beyond the scope of the current work, we refer readers to their study for more Methods to model heterogeneity in MASEM details. We focus on TSSEM, in which subgroup analysis is the only option to evaluate moderator effects. The above framework from general meta-analysis is also applicable to MASEM. Table 1 gives an overview of the Currently used methods to test hypotheses about suitability, and the advantages and disadvantages of using heterogeneity in MASEM different combinations of fixed- versus random-effects MASEM, with or without subgroups. Case 1 represents A disadvantage of the way subgroup analysis is commonly overall analysis with a fixed-effects model. Fixed-effects applied, is that all Stage 2 parameters are allowed to be models are very restrictive, (i.e. the number of parameters to different across subgroups, regardless of expectations about be estimated is relatively small), which makes them easy to differences in specific parameters. That is, differences in apply. However, homogeneity of correlation matrices across parameter estimates across groups are seldom tested in the studies may not be realistic, leading to biased significance structural model. For example, Rosenbusch et al. (2013) tests (Hafdahl, 2008; Zhang, 2011). performed a MASEM analysis on data from 83 studies, Behav Res (2018) 50:1359–1373 1361 Table 1 Overview of advantages (+) and disadvantages (–) of subgroup versus overall analysis and fixed-effects versus random-effects models FEM REM Case 1 Case 2 Use if: There is no hypothesis about moderation, There is no hypothesis about modera- and homogeneity is realistic tion, and homogeneity is not realistic Overall + 1) Small number of parameters 1) Accounts for heterogeneity 2) Sometimes the only option (e.g. with a 2) Allows for unconditional inference small number of studies) – 1) Only allows for conditional inference 1) Large number of parameters (but smaller than without subgroups) 2) Biased significance tests if homogeneity 2) No information about specific effects of moderators does not hold 3) Masks subgroup differences in parameters 3) Masks subgroup differences in parameters Case 3 Case 4 Use if: There is a specific hypothesis about subgroups, There is a specific hypothesis about subgroups, and homogeneity within subgroups is realistic and homogeneity within subgroups is not realistic Subgroups + 1) Small number of parameters 1) Accounts for additional heterogeneity within subgroups 2) Sometimes the only option (e.g. with a 2) Allows for unconditional inference small number of studies) 3) Posibility to test subgroup differ- 3) Posibility to test subgroup differences in parameters ences in parameters – 1) Only allows for conditional inference 1) Large number of parameters (larger than without subgroups) 2) Need to dichotomize continuous moderator 2) Need to dichotomize continuous moderator 2) Biased parameter estimates if 3) Number of studies per subgroup might get too small homogeneity does not hold testing a model in which the influence of the external on the four pooled Stage 1 correlation coefficients in environment of firms on performance levels is mediated by the subgroups, ignoring the estimates in the actual path the entrepreneurial orientation of the firm. They split the models altogether. These approaches are not ideal because data into a group of studies based on small sized firms researchers cannot test whether some of the parameters, and medium-to-large sized firms, to investigate whether the those that may be of theoretical interest, are significantly regression parameters in the path model are moderated by different across groups. firm size. However, after fitting the path model to the pooled More often than using subgroup analysis, researchers correlation matrices in the two subgroups, they compared address the moderation of effect sizes using standard meta- the results without using any statistical tests. analysis techniques on individual effect sizes, before they Gerow et al. (2013) hypothesized that the influence conduct the MASEM analysis. They use techniques such of intrinsic motivation on individuals’ interaction with as meta-regression or ANOVA-type analyses (Lipsey & information technology was greater when the technology Wilson, 2001). Independent of the moderation effects, was to be used for hedonistic applications than for practical the MASEM is then performed using the full set of applications. They fitted the structural model to a subgroup studies. Examples of this practice can be found in Drees of studies with hedonistic applications, a subgroup of and Heugens (2013), Earnest, Allen, and Landis (2011) studies with practical applications, and a subgroup of and Jiang, Liu, Mckay, Lee, and Mitchell (2012). A studies with a mix of applications. However, to test for disadvantage of this approach is that moderation is tested on differences between the subgroups, they performed t-tests the correlation coefficients, and not on specific parameters 1362 Behav Res (2018) 50:1359–1373 in a structural equation model. Most often, this is not in line Stage 1 with the hypothesis of interest. For example, the moderator hypotheses of Gerow et al. (2013), were about the direct In fixed-effects TSSEM, the correlation matrices in the effects in the path model but not about covariances and individual studies are assumed to be homogenous across variances. Although subgroup analysis to test heterogeneity studies, all being estimates of one common population has previously been conducted (see Haus et al. 2013), we correlation matrix. Differences between the correlation think that instructions regarding the procedures are needed matrices in different studies are assumed to be solely the because most researchers who apply MASEM still choose result of sampling error. The model that is fitted at Stage 1 to address issues of moderation outside the context of is a multigroup model in which all correlation coefficients MASEM. are assumed to be equal across studies. Fitting this model to the observed correlation matrices in the studies leads to an estimate of the population correlation matrix P ,which is Overview of this article correctly estimated if homogeneity indeed holds. In the next sections, we briefly introduce fixed- and random- Stage 2 effects TSSEM and propose a follow-up analysis to address heterogeneity using subgroup analysis. We discuss some In Stage 2 of the analysis, weighted least squares (WLS) issues related to testing the equality of parameters using estimation (Browne, 1984) is used to fit a structural equation model to the estimated common correlation matrix from Stage pooled correlation matrices. Next, we illustrate the proce- dure using an example of testing the equality of factor load- 1. The proposed weight matrix in WLS-estimation is the ings across study-level variables of the Hospital Anxiety and inverse asymptotic variance covariance matrix of the Stage 1 −1 Depression Scale (HADS) with data from Norton, Cosco, estimates of P , i.e., W = V (Cheung & Chan, 2005b). F F Doyle, Done, and Sacker (2013) as well as with an exam- These weights ensure that correlation coefficients that are ple of testing moderation by socio-economic status (SES) in based on more information (on more studies and/or studies a path model linking teacher-child relations to engagement with larger sample sizes) get more weight in the estimation and achievement (Roorda, Koomen, Spilt, & Oort, 2011). of the Stage 2 parameters. The Stage 2 analysis leads to To facilitate the use of the proposed procedure, detailed estimates of the model parameters and a χ measure of fit. reports of the analyses, including data and R-scripts, are pro- vided online at www.suzannejak.nl/masem code. Finally, Random-effects TSSEM we present a small simulation study to evaluate the effect of the number of studies included in a MASEM analysis on the Stage 1 frequency of estimation problems. In random-effects TSSEM, the population effects sizes are allowed to differ across studies. The between-study TSSEM variability is taken into account in the Stage 1 analysis. Estimates of the means and the covariance matrices in In the next two sections we briefly describe fixed-effects random-effects TSSEM are obtained by fixing the sampling TSSEM and random-effects TSSEM. For a more elaborate covariance matrices to the known values (through definition explanation see Cheung and Chan (2005b), Cheung (2014), variables, see Cheung (2015a), and using full information Cheung (2015a), and Jak (2015). maximum likelihood to estimate the vector of means, P , and the between-studies covariances, T (Cheung, 2014). Fixed-effects TSSEM Stage 2 The fixed-effects TSSEM approach was proposed by Cheung and Chan (2005b). They performed a simulation Fitting the Stage 2 model in the random-effects approach is study, comparing the fixed-effects TSSEM approach to two not very different from fitting the Stage 2 model in the fixed- univariate approaches (Hunter & Schmidt, 2015; Hedges & effects approach. The values in W from a random-effects Olkin, 1985) and the multivariate GLS-approach (Becker, analysis are usually larger than those obtained from a fixed- 1992, 1995). They found that the TSSEM approach showed effects analysis, because the between-studies covariance is the best results with respect to parameter accuracy and false added to the construction of the weight matrix. This results positive rates of rejecting homogeneity. in relatively more weight being given to smaller studies, and Behav Res (2018) 50:1359–1373 1363 larger standard errors and confidence intervals, than with the to the two matrices. For example, one could fit a factor fixed-effects approach. model in both groups: P =    +  , (1) g g g g Using subgroup analysis to test parameter heterogeneity where with p observed variables and k common factors, is a full p by k matrix with factor loadings in The basic procedure for subgroup analysis comprises sep- group g,  is a k by k symmetrical matrix with factor arate Stage 1 analyses for the subgroups. The Stage 1 variances and covariances in group g,and  is a p by p analyses may be in the fixed-effects framework, hypothesiz- symmetrical matrix with residual (co)variances in group g. ing homogeneity within subgroups, or in the random-effects The covariance structure is identified by setting diag( ) framework, assuming that there is still substantive between- = I. Since the input is a correlation matrix, the constraint study heterogeneity within the subgroups. In a subgroup diag(P )= I, is required to ensure that the diagonals of P g g MASEM analysis, it is straightforward to equate certain are always ones during estimation. parameters across groups at Stage 1 or Stage 2 of the analy- In order to test the equality of factor loadings across sis. The differences in the parameters across groups can be groups, a model can be fitted in which  = . Under tested using a likelihood ratio test by comparing the fit of the null hypothesis of equal factor loadings, the difference a model with across-groups equality constraints on certain in chi-squares of the models with  =  and  = g g g parameters with a model in which the parameters are freely  asymptotically follows a chi-square distribution with estimated across groups. degrees of freedom equal to the difference in the number of freely estimated parameters. If the difference in chi-squares Testing heterogeneity in Stage 1 parameters is considered significant, the null hypothesis of equal factor loadings is rejected. Although we focus on testing differences in Stage 2 parame- The approach of creating subgroups with similar study ters, in some situations it may be interesting to test the equal- characteristics and equating parameters across groups is ity of the pooled correlation matrices across subgroups. In suitable for any structural equation model. For example, order to test the hypothesis that the correlation matrices in a path model, it may be hypothesized that some or all from a fixed-effects subgroup analysis, P , are equal across direct effects are different across subgroups of studies, but subgroups g, one could fit a model with the constraint P variances and residual variances are not. One could then g1 = P . Under the null hypothesis of equal correlation matri- compare a model with equal regression coefficients with a g2 ces across groups, the difference in the -2 log-likelihoods of model with freely estimated regression coefficients to test the models with and without this constraint asymptotically the hypothesis. Also, the subgroups approach can be applied follows a chi-square distribution with degrees of freedom using fixed-effects or random-effects analyses. equal to the number of constrained correlation coefficients. Similarly, one could perform this test on the averaged corre- Issues related to testing equality constraints based lation matrices from a random-effects Stage 1 analysis. With on correlation matrices in TSSEM random-effects analysis, it may additionally be tested if the subgroups differ in their heterogeneity covariance matrices Structural equation models are ideally fitted on covariance T . When the researcher’s hypotheses are directly about matrices. In MASEM, and meta-analysis in general, it is Stage 2 parameters, one may skip testing the equality of very common to synthesize correlation coefficients. One equal correlation matrices across subgroups. The equality reason for the synthesis of standardized effect sizes is of between-studies covariance matrices may still be useful that different studies may use different instruments with to reduce the number of parameters to be estimated in a different scales to operationalize the variables of interest. random-effects analysis. This issue is discussed further in The analysis of correlation matrices does not pose problems the general discussion. when the necessary constraints are included (Bentler & Savalei, 2010; Cheung, 2015a). However, it should be taken Testing heterogeneity in Stage 2 parameters into account that fitting models to correlation matrices with TSSEM implies that all parameter estimates are in a For ease of discussion, we suppose that there are two sub- standardized metric (assuming that all latent variables are groups. Given the two Stage 1 pooled correlation matrices scaled to have unit variances, which is recommended in in the subgroups g,say, P , a structural model can be fitted TSSEM (Cheung, 2015a)). g 1364 Behav Res (2018) 50:1359–1373 When we compare models across subgroups in TSSEM, it is recommended that the common factors be identified we are thus comparing parameter estimates that are stan- by fixing their variances to 1 (Cheung, 2015a). All results dardized with respect to the observed and latent variables obtained from a MASEM-analysis on correlation matrices within the subgroups (Cheung, 2015a; Steiger, 2002). This are thus standardized with respect to the observed variables may not necessarily be a problem - sometimes it is even and the common factor. As a consequence of this standard- desirable to compare standardized coefficients (see Kwan ization, the residual variances in  are effectively not free and Chan (2011)). For example, van den Boer, van Bergen, parameters, but the remainder of diag(I) − diag( ) and de Jong (2014) tested the equality of correlations (Cheung, 2015a). between three reading tasks across an oral and a silent read- Similar to path analysis, when testing the equality of ing group. However, it is important to be aware of this issue factor loadings across subgroups in MASEM, the results and to interpret the results correctly. Suppose that a stan- may not be generalizable to unstandardized factor loadings, dardized regression coefficient from variable x on variable y due to across-group differences in the (unknown) variances β , is compared across two subgroups of studies, g and g . of the indicators and common factors. Moreover, if all 1 2 yx The standardized direct effects in the subgroups are given standardized factor loadings are set to be equal across by: groups, this implies that all standardized residual variances σ are equal across groups. Note that although one may be g1 β = β (2) yx yx g1 g1 inclined to denote a test of the equality of factor loadings g1 a test of weak factorial invariance (Meredith, 1993), this and would strictly be incorrect, as weak factorial invariance g2 pertains to the equality of unstandardized factor loadings. β = β , (3) yx yx g2 g2 g2 where β represents an unstandardized regression coeffi- Examples cient, β represents a standardized regression coefficient, and σ represents a standard deviation. In the special case In this section, we present two examples of the testing of that the standard deviations of x and y are equal within subgroups, in each subgroup the standardized coefficient is moderator hypotheses in MASEM using subgroup analysis. Example 1 illustrates the testing of the equality of factor equal to the unstandardized coefficient, and the test of H : ∗ ∗ β = β is equal to the test of H : β = β .In loadings using factor analysis under the fixed-effects model 0 yx yx yx yx g1 g2 g1 g2 (Case 1 and 3 from Table 1). Example 2 illustrates the fact, this not only holds when the standard deviations of the testing of the moderation of direct effects using path variables are equal in the subgroups, but in general when the ratio of σ over σ is equal across subgroups. For exam- analysis under the random-effects model (Case 2 and 4 from x y Table 1). The R-syntax for the examples can be found online ple, when σ and σ in group 1 are respectively 2 and 4, x y and the σ and σ in group 2 are respectively 1 and 2, the (http://www.suzannejak.nl/masem code). x y standardized regression coefficient equals the unstandard- ized coefficient times .5 in both groups. In this case, a test of Example 1 – Testing equality of factor loadings of the Hospital Anxiety and Depression Scale the equality of the standardized regression coefficients will lead to the same conclusion as a test of the unstandardized Introduction regression coefficients. However, in most cases the ratio of standard deviations The HADS was designed to measure psychological dis- will not be exactly equal across groups. Therefore, when testing the equality of regression coefficients in a path tress in non-psychiatric patient populations (Zigmond & model, one has to realize that all parameters are in a stan- Snaith, 1983), and is widely used in research on distress dardized metric. The conclusions may not be generalizable in patients. The instrument consists of 14 items: the odd to unstandardized coefficients. Whether the standardized numbered items are designed to measure anxiety and the even numbered items are designed to measure depression. or the unstandardized regression coefficients are more rel- evant depends on the research questions (Bentler, 2007). The items are scored on a 4-point scale. Some controversy exists regarding the validity of the HADS (Zakrzewska, In the context of meta-analysis, standardized coefficients are generally preferred (Cheung, 2009; Hunter & Hamilton, 2012). The HADS has generally been found to be a useful instrument for screening purposes, but not for diagnostics 2002). In a factor analytic model, several methods of standard- purposes (Mitchell, Meader, & Symonds, 2010). Ambigu- ous results regarding the factor structure of the HADS ization exist. Parameter estimates may be standardized with respect to the observed variables only, or with respect to led to a meta-analytic study by Norton et al. (2013), who the observed variables and common factors. In MASEM, gathered correlation matrices of the 14 HADS items from 28 Behav Res (2018) 50:1359–1373 1365 published studies. Using meta-analytic confirmatory factor samples were tested. If the equality constraints on the factor analysis, they found that a bi-factor model that included all loadings led to a significantly higher chi-square statistic, the items loading onto a general distress factor and two orthog- (standardized) factor loadings would be considered to differ onal anxiety and depression factors provided the best fit to across groups. the pooled data. Of the 28 studies evaluated by Norton et Exact fit of a proposed model is rejected if the χ statistic al., 10 considered non-patient samples and 18 were based is found to be significant. Exact fit will rarely hold in on patient samples. As an illustration we will test the equal- MASEM, due to the large total sample size. Therefore, ity of factor loadings across studies based on patient and as in standard SEM, it is common to use approximate non-patient samples. fit to assess the fit of models. Approximate close fit is associated with RMSEA-values under .05, satisfactory Analysis approximate fit with RMSEA-values under .08, and bad approximate fit is associated with RMSEA-values larger All of the models were fitted using the metaSEM, and than .10 (MacCallum, Browne, and Sugawara, 1996). In OpenMx packages in the R statistical platform. First we fit addition to the RMSEA, we will evaluate the CFI (Bentler, the Stage 1 and Stage 2 models with a fixed-effects model 1990) and the standardized root mean squared residual to the total set of studies (illustrating Case 1 from Table 1). (SRMR). CFI-values above .95 and SRMR-values under .08 The stage 1 analysis using the fixed-effects model involved are considered satisfactory (Hu & Bentler, 1999). For more fitting a model to the 28 correlation matrices in which all information about the calculation and use of fit-indices in SEM we refer to Schermelleh-Engel et al. (2003). correlation coefficients were restricted to be equal across studies. Misfit of this model would indicate inequality of the correlation coefficients across studies. Stage 2 involved Results fitting the bi-factor model that Norton et al. (2013) found to have the best fit to the data (see Fig. 1). Overall Stage 1: Testing homogeneity and pooling correla- Next, two subgroups of studies were created, one group tion matrices The Stage 1 model did not have exact fit to with the 10 non-patient samples and the other with the 18 the data, χ (2,457) = 10,400.04, p <.01. Approximate fit patient samples (illustrating Case 3 from Table 1). First, was acceptable according to the RMSEA (.064, 95% CI: the Stage 1 analyses were performed in the two groups [.063 ; .066]), but not according to the CFI (.914) and SRMR separately, leading to two pooled correlation matrices. (.098). Based on the CFI and SRMR, one should not con- Then, the factor model without equality constraints across tinue to fit the structural model, or use random-effects mod- eling. However, in order to illustrate the modeling involved subgroups was fitted to the data. Next, three models in which the factor loadings of the general distress factor, in Case 1, we will continue with Stage 2 using overall fixed- anxiety factor and depression factor respectively were effects analysis. Table 2 shows the pooled correlation matrix constrained to be equal across patient and non-patient based on the fixed-effects Stage 1 analysis. General Distress Anxiety Depression 1,1 28,1 λ 27,1 27,2 2,1 1,2 28,3 2,3 Item 1 Item 27 Item 2 Item 28 …... …... θ θ θ θ 27,27 2,2 28,28 1,1 Fig. 1 The bi-factor model on the HADS-items 1366 Behav Res (2018) 50:1359–1373 Table 2 Pooled correlation matrix based on the fixed effects Stage 1 analysis of the HADS data v1 v3 v5 v7 v9 v11 v13 v2 v4 v6 v8 v10 v12 v14 v1 1 v3 .48 1 v5 .55 .52 1 v7 .42 .36 .41 1 v9 .42 .46 .42 .35 1 v11 .33 .29 .33 .32 .28 1 v13 .49 .54 .50 .36 .50 .37 1 v2 .29 .24 .30 .34 .25 .18 .26 1 v4 .29 .28 .32 .36 .27 .18 .28 .42 1 v6 .40 .36 .43 .40 .31 .22 .36 .38 .45 1 v8 .35 .30 .34 .28 .27 .23 .33 .36 .25 .33 1 v10 .23 .21 .25 .22 .18 .17 .22 .25 .26 .30 .26 1 v12 .30 .27 .32 .36 .28 .19 .29 .47 .46 .42 .32 .33 1 v14 .24 .22 .25 .34 .22 .21 .25 .28 .31 .31 .19 .21 .33 1 Overall Stage 2: Fitting a factor model to the pooled Subgroup Stage 1: Testing homogeneity and pooling cor- correlation matrix Norton et al. (2013) concluded that a bi- relation matrices In the patient group, homogeneity was factor model showed the best fit to the data. We replicated rejected by the chi-square test (χ (1,547) = 5,756.84, p the analyses and found that, indeed, the model fit is <.05). Homogeneity could be considered to hold approxi- acceptable according to the RMSEA (χ (63) = 2,101.48, mately, based on the RMSEA (.071, 95% CI: [.070 ; .073]), RMSEA = .039, 95% CI RMSEA: [.037 ; .040], CFI = .953, but not based on the CFI (.923) and SRMR (.111). In the SRMR = .033). The parameter estimates from this model non-patient group, homogeneity was also rejected by the can be found in Table 3. All items loaded substantially on chi-square test, χ (819) = 3,254.60, p <.05, but approxi- the general factor, and most items had smaller loadings on mate fit could be considered acceptable based on the RMSEA the specific factor. Contrary to expectations, Item 7 has a and SRMR (RMSEA = .049, 95% CI RMSEA: [.048 ; .051], negative loading on the anxiety factor. CFI = .941, SRMR = .062). Although the model with a Table 3 Parameter estimates and 95% confidence intervals from the bi-factor model on the total HADS data General  Anxiety  Depression est. lb ub est. lb ub est. lb ub est. lb ub v1 .69 .68 .70 .19 .17 .22 .48 .47 .50 v3 .61 .60 .62 .40 .38 .42 .47 .45 .48 v5 .71 .70 .72 .23 .21 .26 .45 .44 .46 v7 .71 .70 .72 −.13 −.16 −.09 .48 .45 .50 v9 .56 .54 .57 .33 .31 .36 .58 .57 .59 v11 .48 .46 .49 .12 .10 .15 .76 .75 .77 v13 .63 .62 .64 .45 .42 .47 .40 .39 .42 v2 .47 .46 .48 .47 .45 .48 .56 .55 .57 v4 .50 .48 .51 .44 .42 .45 .56 .55 .58 v6 .61 .60 .63 .29 .28 .31 .54 .52 .55 v8 .50 .49 .52 .21 .19 .23 .70 .69 .71 v10 .37 .35 .38 .27 .25 .29 .79 .78 .80 v12 .50 .48 .51 .53 .51 .55 .47 .46 .49 v14 .43 .42 .44 .23 .21 .25 .76 .75 .77 Note: est = parameter estimate, lb = lower bound, ub = upper bound,  General,  Anxiety and  Depression refer to the factor loadings associated with these factors,  refers to residual variance Behav Res (2018) 50:1359–1373 1367 common correlation matrix does not have acceptable fit in standard deviations were not available for most of the the patient group, indicating that not all heterogeneity is included studies. explained by differentiating patient and non-patient sam- We used fixed-effects overall and subgroup analysis, ples, we continue with Stage 2 analysis as an illustration of although homogeneity of correlation matrices did not hold. the procedure when the interest is Case 2 (see Table 1). Therefore, it would have been more appropriate to apply random-effects analysis. However, due to the relatively Subgroup Stage 2: Testing equality of factor loadings The large number of variables and the small number of studies, a fit of the models with freely estimated factor loadings and random-effects model did not converge to a solution. Even with equality constraints on particular sets of factor loadings the most restrictive model with only a diagonal T that was can be found in Table 4. The RMSEAs of all models set to be equal across subgroups did not solve this problem. indicated close approximate fit. However, the χ -difference The results that were obtained should thus be interpreted tests show that the factor loadings cannot be considered with caution, as the Type 1 errors may be inflated. The next equal for any of the three factors. Figure 2 shows a plot of example shows random-effects subgroup-analysis, which the standardized factor loadings in the two groups. For the may be the appropriate framework in most cases. majority of the items, the factor loadings are higher in the Example 2 – Testing moderation of the effect patient group than in the non-patient group. of teacher-student relations on engagement Discussion and achievement We found that the factor loadings of the bi-factor model Introduction on the HADS differed across the studies involving patients versus studies involving non-patients. The items were In this example we use random-effects subgroup analysis generally found to be more indicative of general distress to test moderation by SES in a path model linking teacher- in the studies with patient samples than in the studies with child relations to engagement and achievement. Children non-patient samples. A possible reason for this finding is with low SES are often found to be at risk of failing that the HADS was developed for use in hospital settings, in school and dropping out (Becker & Luthar, 2002). and thus was designed for use with patients. In practice, According to Hamre and Pianta (2001), children at risk of failing in school may have more to gain from an ability researchers may continue with the analysis by testing the equality of individual factor loadings across subgroups. For to adapt to the social environment of the classroom than children who are doing very well at school. Therefore, it can example, the factor loading of Item 2 from the Depression factor seems to differ more across groups than the other be expected that the effects of teacher-child relations may be stronger for children with lower SES. factor loadings for this factor. Such follow-up analyses may give more insight into specific differences across subgroups. Roorda, Koomen, Spilt, and Oort (2011) performed a However, it is advisable to apply some correction on the meta-analysis on correlation coefficients between measures significance level, such as a Bonferroni correction, when of positive and negative teacher-student relations, engage- testing the equality of several parameters individually. ment and achievement. They used univariate moderator anal- A problem with these data is that the HADS is scored on ysis, and found that all correlations were larger in absolute a 4-point scale, but the analysis was performed on Pearson value for studies with relatively more students with low SES. In the current analysis, we will test the moderation of product moment correlations, assuming continuous vari- ables. This may have led to underestimated correlation coef- the specific effects in a path model. We will use 45 studies reported by Roorda et al. (2011) and Jak, Oort, Roorda, and ficients. Moreover, it would have been informative to ana- lyze covariance matrices rather than correlation matrices, Koomen (2013), which include information about SES of the samples. enabling a test on weak factorial invariance. However, the Table 4 Overall fit and difference in fit of the factor model with different equality constraints across groups 2 2 df χ p RMSEA [95% CI] CFI SRMR df χ p 1. No constraints 126 2249.21 <.05 .039 [.038 ; .041] .955 .035 2.  General equal 140 3125.51 <.05 .044 [.043 ; .046] .936 .061 14 876.30 <.05 3.  Anxiety equal 133 2266.14 <.05 .038 [.037 ; .040] .955 .036 7 16.93 <.05 4.  Depression equal 133 2300.62 <.05 .039 [.037 ; .040] .954 .037 7 51.41 <.05 2 2 Note: df and χ refer to the difference in df and χ in comparison with Model 1 1368 Behav Res (2018) 50:1359–1373 Fig. 2 A plot of the estimated factor loadings and 95% confidence intervals for the patient group (red) and non-patient group (grey) Note: We show the absolute value of the factor loading of Item 7 on the Anxiety factor Analysis Results First we will perform a random-effects Stage 1 and Stage 2 Overall Stage 1: Random-effects analysis The pooled corre- analysis on the total sample of studies (representing Case 2 lations based on the random-effects analysis can be found from Table 1). Next, we split the studies into two subgroups in Table 5. When a random-effects model is used, an I based on SES (representing Case 4 from Table 1). We will value may be calculated. It can be interpreted as the pro- fit the hypothesized path model (see Fig. 3) to a group portion of study-level variance in the effect size (Higgins & of studies in which the majority of the respondents were Thompson, 2002). The I values (above the diagonal) show indicated to have low SES (24 studies), and a group of that there is substantial between-studies variability in the studies for which the majority of the sample was indicated correlation coefficients, ranging from .79 to .94. with high SES (21 studies). Note that SES is a continuous moderator variable in this case (percentages). We split the Overall Stage 2: Fitting a path model We fitted a path model studies in two groups based on the criterion of 50% of to the pooled Stage 1 correlation matrix, in which positive and the sample having low SES. Then, we test the equivalence negative relations predicted achievement indirectly, through of the direct effects across groups by constraining the engagement. Exact fit of this model was rejected (χ (2) = effects to be equal across subgroups. Using a significance 11.16, p <.05). However, the RMSEA of .013 (95% CI = level of .05, if the χ statistic increased significantly given [.006 ; .020]) indicated close approximate fit, as well as the the increased degrees of freedom when adding equality CFI (.966) and SRMR (.045). Table 6 shows the parameter constraints across groups, one or more of the parameters estimates and the associated 95% confidence intervals. All would be considered significantly different across groups. parameter estimates were considered significantly different Note that dichotomizing a continuous variable is generally from zero, as zero is not included in the 95% confidence not advised. In this example we dichotomize the moderator intervals. The indirect effects of positive and negative rela- in order to illustrate subgroup analyses. Moreover, in tions on achievement were small, but significant. Although TSSEM, the analysis of continuous moderator variables is the model shows good fit on the averaged correlation not yet well developed. matrix, this analysis provides no information about whether Behav Res (2018) 50:1359–1373 1369 1,1 3,1 4,3 Student engagement Student achievement 2,1 3,2 3,3 ψ 4,4 2,2 Fig. 3 The hypothesized path model for Example 2 SES might explain the between-study heterogeneity. Sub- the three direct effects in the path model to be equal across group analysis is used to test whether the parameters differ subgroups did not lead to a significant increase in misfit, across studies with different levels of average SES. χ (3) = 5.18, p = .16. Therefore, the null hypothesis of equal direct effects across subgroups is not rejected. Subgroup Stage 1: Random-effects analysis Different Discussion In this example we tested whether the direct pooled correlation matrices were estimated in the group of effects in a path model linking teacher-child relations to studies with low SES and the group of studies with high engagement and achievement were moderated by SES. The SES (see Tables 7 and 8). The proportions of between- 2 subgroup analysis showed that the null-hypothesis stating studies variance (I ) within the subgroups are smaller than that the effects are equal in the low SES and high SES they were in the total sample, indicating that SES explains populations cannot be rejected. Note that non-rejection of part of the between-study heterogeneity. a null-hypothesis does not imply that the null-hypothesis is true. It could also mean that our design did not have Subgroup Stage 2: Testing moderation of effects by SES The enough statistical power to detect an existing difference in hypothesized path model showed acceptable approximate the population. fit, but no exact fit, in the low-SES group, χ (2) = 6.28, p <.05, RMSEA = .013 (95% CI = [.002 ; .026]), CFI = .978, SRMR = .041 as well as in the high-SES group, χ (2) Simulation study = 9.50, p <.05, RMSEA = .015 (95% CI = [.006 ; .025]), CFI = .936, SRMR = .0549. The fit of the unconstrained It is often necessary to create subgroups of studies, because baseline model, with which the fit of the models with an overall analysis will mask differences in parameters across equality constraints will be compared, is equal to the sum of the fit of the models in the two subgroups. Therefore, the χ Table 6 Parameter estimates and 95% confidence intervals of the and df against which the constrained models will be tested hypothesized path model is df = 2+2 = 4 and χ = 6.28 + 9.50 = 15.78. Constraining Parameter est lb ub β .27 .20 .35 Table 5 Pooled correlations (under the diagonal) and I (above the 31 diagonal) based on the random effects Stage 1 analysis β −.30 −.38 −.22 β .35 .29 .41 v1 v2 v3 v4 β * β .10 .07 .12 31 43 v1. Positive relations 1 .92 .94 .79 β * β −.10 −.14 −.07 32 43 v2. Negative relations −.24 1 .88 .80 ψ −.24 −.32 −.16 v3. Engagement .32 −.31 1 .90 ψ .80 .73 .85 v4. Achievement .14 −.18 .28 1 ψ .88 .83 .92 44 1370 Behav Res (2018) 50:1359–1373 The data-generating model was based on the results from Table 7 Pooled correlations (under the diagonal) and I (above the diagonal) based on the random effects Stage 1 analysis in studies with Example 2. The population values for the direct effects in low SES Subgroup 1 were: β = .265, β = -.307, β = .288, and 31 32 43 ψ = -.329. The between-studies variance used to generate v1 v2 v3 v4 random correlation matrices was based on Example 2. In v1. Positive relations 1 .85 .94 .71 Subgroup 2, all population values were identical to the v2. Negative relations −.33 1 .83 .73 values in Subgroup 1, except for β , which was .388 (.10 v3. Engagement .35 −.35 1 .86 larger than in Subgroup 1). We generated data with k = 22, v4. Achievement .12 −.18 .23 1 k = 44, k = 66 or k = 88 studies per subgroup, with sample sizes of n=200 for each study. For each condition we generated 2000 meta-analytic datasets. In each condition we fitted the correct model to the two subgroups. For example, if the population regression coeffi- subgroups separately, as well as to the subgroups combined. cient is 0.20 for Subgroup 1, and 0.30 for Subgroup 2, an We restricted the between-studies covariance matrices to be analysis of all of the studies together will result in an esti- diagonal, in order to reduce the number of parameters to mated regression coefficient of between 0.20 and 0.30. This be estimated. In practice, this restriction is often applied means that the effect will be overestimated for Subgroup 1 and (Becker, 2009). We evaluated the percentage of converged underestimated for Subgroup 2. Subgroup analysis will lead solutions, the relative bias in the estimate of β ,andthe to better parameter estimates in the subgroups. However, cre- relative bias in the standard error of β across methods and ating subgroups may lead to small numbers of studies within conditions. The relative percentage of estimation bias for each subgroup. In combination with having twice as many β was calculated as parameters to be estimated as with an overall analysis, small numbers of studies will likely result in estimation problems β − β 43 43 100 ∗ .(4) such as non-convergence. Convergence is an important issue, because researchers will be unable to present any meaningful results of the MASEM analysis without having a converged We regarded estimation bias of less than 5% as acceptable (Hoogland & Boomsma, 1998). The relative percentage of solution. In order to evaluate the effect of the number of studies within each subgroup on the frequency of estimation bias in the standard error of β was calculated as: problems, we conducted a small simulation study. ¯ ˆ ˆ SE(β ) − SD(β ) 43 43 100 ∗ , (5) SD(β ) Data generation and conditions 43 ¯ ˆ ˆ where SE(β ) is the average standard error of β across 43 43 We generated data from two subgroups, in which one regres- replications, and SD(β ) is the standard deviation of the sion coefficient differed by .10 points across subgroups in parameter estimates across replications. We considered the the population. Next, we fitted the correct model to the standard errors to be unbiased if the relative bias was smaller two subgroups separately, as well as to the combined data. than 10% (Hoogland & Boomsma, 1998). We expected that, due to the larger number of studies, the percentage of converged solutions would be larger for the Results overall analysis than for the subgroup analyses and that the estimation bias in the manipulated effect would be smaller Convergence in the subgroup analysis (because the regression coefficient is allowed to be different in each subgroup). Figure 4a shows the convergence rates for all conditions. As expected, the analysis of the total dataset resulted in more converged solutions than the subgroup analysis in all Table 8 Pooled correlations (under the diagonal) and I (above the conditions. In addition, convergence rates increased with diagonal) based on the random effects Stage 1 analysis in studies with the number of studies. However, the convergence rates were high SES generally low. For example, with 22 studies per subgroup v1 v2 v3 v4 (the condition similar to that of our Example 2), only 43% of the datasets led to a converged solution with the v1. Positive relations 1 .90 .84 .79 overall analysis, while only around 30% converged with v2. Negative relations −.17 1 .66 .80 the subgroup analysis. With small numbers of studies per v3. Engagement .23 −.23 1 .87 subgroups (smaller than 44), most analyses are expected to v4. Achievement .16 −.18 .34 1 not result in a converged solution. Behav Res (2018) 50:1359–1373 1371 0.10, the percentages of the relative bias exceeded the cut- off of 5% in all conditions for the overall analysis. For parameters that did not differ across subgroups, all analyses yielded unbiased estimates. Bias in standard errors The relative bias in standard errors was around 10% in all conditions for the overall analysis. With the subgroup anal- ysis, the standard error estimates were more accurate, with a bias of between roughly -5% and 5% in all conditions. The results are presented in Fig. 4c. The standard errors of the parameters that did not differ across subgroups were unbiased for all analyses. Conclusion on the simulation study The simulation study showed that convergence is a serious potential problem when applying random-effects MASEM. Moreover, the likelihood of non-convergence occurring increases with smaller numbers of studies, such as with a subgroup analysis. However, if the model converges, the subgroup analysis will lead to better parameter estimates and standard error estimates in cases where a difference in the population coefficient is present, even if the population difference is small. In order to increase the likelihood of obtaining a converged solution, it is recommended that as many studies as possible be included. General discussion We proposed subgroup analysis to test moderation hypothe- Fig. 4 Convergence, parameter bias and standard error bias for overall ses on specific parameters in MASEM. We illustrated the and subgroup analysis with a group difference of 0.10 in β Note: The results in panels B and C are based on only those replications approach using TSSEM. The subgroup analysis method that that led to a converged solution for all three analyses. The numbers was presented is not restricted to TSSEM. One could just of replications used are 141, 188, 246, and 300 replications for k=22, as easily apply the subgroups analysis on pooled correla- k=44, k=66, and k=88 respectively tion matrices obtained with univariate approaches (Hunter &Schmidt, 2015; Hedges & Olkin, 1985) or the multivari- Bias in parameter estimates ate GLS-approach (Becker, 1992; 1995). However, based on earlier research comparing these approaches (Cheung & We evaluated the parameter bias in β only for. The results Chan, 2005b; Jak & Cheung, 2017), univariate approaches are presented in Fig. 4b. The percentage of estimation bias are not recommended for MASEM. was not related to the number of studies or to sample size. Creating subgroups of studies to test the equality of As expected, the overall analysis resulted in underestimation parameters across groups is a useful approach, but may for Subgroup 1 and overestimation for Subgroup 2, while also lead to relatively small numbers of studies within each the subgroup analysis led to unbiased parameter estimates. subgroup. Given the large number of parameters involved Although the difference in the population value was only in random-effects modeling, the number of studies may become too small for a converged solution to be obtained, Consequently, the numbers of replications used to calculate the bias as was the case in our Example 1. One way to reduce were 141, 188, 246, and 300 of the 2000 replications for k=22, the number of parameters is to estimate the between-study k=44, k=66, and k=88, respectively. We have also calculated the bias heterogeneity variances but not the covariances among using all converged solutions per method (resulting in larger, but the random effects, i.e., restricting T to be diagonal. In different numbers of replications being used for different analyses). This approach leads to very similar results, and identical conclusions. practice, this restriction is often needed (Becker, 2009). We 1372 Behav Res (2018) 50:1359–1373 applied this constraint to the two subgroups in the second Closing the achievement gap. Educational Psychologist, 37(4), 197–214. example and in the simulation study. Becker, B. (1992). Using results from replicated studies to estimate In the simulation study, we found that even with a diag- linear models. Journal of Educational Statistics, 17(4), 341–362. onal heterogeneity matrix, random-effects subgroup mod- 10.2307/1165128 eling is often not feasible due to convergence problems. Becker, B. (1995). Corrections to using results from replicated studies to estimate linear models. Journal of Educational and Behavioral In practice, researchers may therefore have no other option Statistics, 20(1), 100–102. 10.2307/1165390 than to apply fixed-effects modeling instead of random- Becker, B. J. (2009). Model-based meta-analysis. In Cooper, H., effects modeling. However, ignoring between-study hetero- Hedges, L. V., & Valentine, J. C. (Eds.) The handbook of research geneity is known to lead to inflated false positive rates for sig- synthesis and meta-analysis. (2nd edn., pp. 377–395). New York: nificance tests (Hafdahl, 2008; Zhang, 2011). Researchers Russell Sage Foundation. Bentler, P. (1990). Comparative fit indexes in structural models. should therefore be careful when interpreting the results of Psychological Bulletin, 107(2), 238–246. significance tests in cases where heterogeneity exists but a Bentler, P. (2007). Can scientifically useful hypotheses be tested fixed-effects model is applied. Collecting more studies to be with correlations? The American Psychologist, 62(8), 769–782. included in the meta-analysis is preferable over switching to https://doi.org/10.1037/0003-066X.62.8.772 Bentler, P. M., & Savalei, V. (2010). Analysis of correlation structures: a fixed-effects model. Current status and open problems. In Kolenikov, S., Steinley, D., A limitation of the subgroup analysis to test moderation & Thombs L. (Eds.) Statistics in the Social Sciences, (pp. 1–36). is that the moderator variables have to be categorical. New Jersey: Wiley. In the second example, we split the studies into two Boker, S. M., Neale, M. C., Maes, H. H., Wilde, M. J., Spiegel, M., Brick, T. R., & BDBL OpenMx, T. (2014). Openmx 2.0 user guide groups based on the percentage of respondents with [Computer software manual]. high SES in the study. By dichotomizing this variable Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. (2009). we throw away information and lose statistical power. Introduction to meta-analysis. Chichester: Wiley. Indeed, contrary to our findings, the univariate meta- Browne, M. (1984). Asymptotically distribution-free methods for the regression analyses reported by Roorda et al. showed analysis of covariance structures. British Journal of Mathematical and Statistical Psychology, 37(1), 62–83. https://doi.org/10.1111/ significant moderation by SES. However, these analyses j.2044-8317.1984.tb00789.x did not take into account the multivariate nature of Cheung, M. W.-L., & Chan, W. (2005a). Classifying correlation the data, and tested the moderation of the correlation coeffi- matrices into relatively homogeneous subgroups: a cluster analytic cients and not of the regression coefficients. Future research approach. Educational and Psychological Measurement, 65(6), 954–979. https://doi.org/10.1177/0013164404273946 is needed to develop methods to include study-level Cheung, M. W.-L., & Chan, W. (2005b). Meta-analytic structural equa- variables as continuous covariates in TSSEM. tion modeling: A two-stage approach. Psychological Methods, 10(1), 40–64. https://doi.org/10.1037/1082-989X.10.1.40 Concluding remarks Cheung, M. (2009). Comparison of methods for constructing confi- dence intervals of standardized indirect effects. Behavior Research Methods, 41(2), 425–438. https://doi.org/10.3758/BRM.41.2.425 In the current paper we presented a framework to test Cheung, M. W.-L., & Chan, W. (2009). A two-stage approach hypotheses about subgroup differences in meta-analytic to synthesizing covariance matrices in meta-analytic struc- structural equation modeling. The metaSEM and OpenMx- tural equation modeling. Structural Equation Modeling: A code and R-functions used in the illustrations are provided Multidisciplinary Journal, 16(1), 28–53. https://doi.org/10.1080/ online, so that researchers may easily adopt the proposed Cheung, M. (2014). Fixed- and random-effects meta-analytic struc- procedures to test moderator hypotheses in their MASEM tural equation modeling: Examples and analyses inR. Behavior analyses. The simulation study showed that increasing the Research Methods, 46(1), 29–40. https://doi.org/10.3758/s13428- number of studies in a random-effects subgroup analysis 013-0361-y increases the likelihood of obtaining a converged solution. Cheung, M. (2015). Meta-analysis: A structural equation modeling approach. Chichester: Wiley. Cheung, M. W.-L. (2015). metaSEM: An R package for meta-analysis Open Access This article is distributed under the terms of the Creative using structural equation modeling. Frontiers in Psychology, Commons Attribution 4.0 International License (http://creativecommons. 5(1521). https://doi.org/10.3389/fpsyg.2014.01521 org/licenses/by/4.0/), which permits unrestricted use, distribution, and Cheung, M. W.-L., & Cheung, S. (2016). Random-effects models for reproduction in any medium, provided you give appropriate credit to meta-analytic structural equation modeling: Review, issues, and the original author(s) and the source, provide a link to the Creative illustrations. Research synthesis methods, 7(2), 140–155. Commons license, and indicate if changes were made. Drees, J. M., & Heugens, P. P. M. A. (2013). Synthesizing and extending resource dependence theory a meta-analysis. Jour- nal of Management, 39(6), 1666–1698. https://doi.org/10.1177/ References Earnest, D. R., Allen, D. G., & Landis, R. (2011). Mecha- nisms linking realistic job previews with turnover: A meta- Becker, B. E., & Luthar, S. S. (2002). Social-emotional factors analytic path analysis. Personnel Psychology, 64(4), 865–897. affecting achievement outcomes among disadvantaged students: https://doi.org/10.1111/j.1744-6570.2011.01230.x Behav Res (2018) 50:1359–1373 1373 Gerow, J. E., Ayyagari, R., Thatcher, J. B., & Roth, P. L. (2013). Lipsey, M., & Wilson, D. (2001). Practical meta-analysis. Thousand Can we have fun @ work? the role of intrinsic motivation for Oaks: Sage Publications. utilitarian systems. European Journal of Information Systems, MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). 22(3), 360–380. https://doi.org/10.1057/ejis.2012.25 Power analysis and determination of sample size for covariance Hafdahl, A. (2008). Combining heterogeneous correlation matrices: structure modeling. Psychological Methods, 1(2), 130–149. Simulation analysis of fixed-effects methods. Journal of Educa- https://doi.org/10.1037/1082-989X.1.2.130 Meredith, W. (1993). Measurement invariance, factor analysis and tional and Behavioral Statistics, 33(4), 507–533. factorial invariance. Psychometrika, 58(4), 525–543. Hamre, B. K., & Pianta, R. (2001). Early teacher–child rela- Mitchell, A. J., Meader, N., & Symonds, P. (2010). Diagnostic validity tionships and the trajectory of children’s school outcomes of the hospital anxiety and depression scale (HADS) in cancer and through eighth grade. Child Development, 72(2), 625–638. palliative settings: A meta-analysis. Journal of Affective Disor- https://doi.org/10.1111/1467-8624.00301 ders, 126(3), 335–348. https://doi.org/10.1016/j.jad.2010.01.067 Haus, I., Steinmetz, H., Isidor, R., & Kabst, R. (2013). Gender effects Norton, S., Cosco, T., Doyle, F., Done, J., & Sacker, A. (2013). on entrepreneurial intention: A meta-analytical structural equation The hospital anxiety and depression scale: A meta confirmatory model. International Journal of Gender and Entrepreneurship, factor analysis. Journal of Psychosomatic Research, 74(1), 74–81. 5(2), 130–156. https://doi.org/10.1108/17566261311328828 https://doi.org/10.1016/j.jpsychores.2012.10.010 Hedges, L., & Olkin, I. (1985). Statistical methods for meta-analysis. R Core Team (2017). R: A language and environment for statistical Orlando: Academic Press. computing. R Foundation for Statistical Computing. Retrieved Hedges, L., & Vevea, J. (1998). Fixed- and random-effects models in from http://www.R-project.org meta-analysis. Psychological Methods, 3(4), 486–504. Roorda, D. L., Koomen, H. M. Y., Spilt, J. L., & Oort, F. (2011). The Higgins, J. P. T., & Thompson, S. (2002). Quantifying heterogeneity influence of affective teacher-student relationships on students’ in a meta-analysis. Statistics in Medicine, 21(11), 1539–1558. school engagement and achievement: a meta-analytic approach. https://doi.org/10.1002/sim.1186 Review of Educational Research, 81(4), 493–529. Hoogland, J. J., & Boomsma, A. (1998). Robustness studies in Rosenbusch, N., Rauch, A., & Bausch, A. (2013). The mediat- covariance structure modeling an overview and a meta-analysis. ing role of entrepreneurial orientation in the task environment– Sociological Methods & Research, 26(3), 329–367. performance relationship a meta-analysis. Journal of Manage- Hu, L.-t., & Bentler, P. (1999). Cutoff criteria for fit indexes in ment, 39(3), 633–659. https://doi.org/10.1177/0149206311425612 covariance structure analysis: Conventional criteria versus new Schermelleh-Engel, K., Moosbrugger, H., & Muller, ¨ H. (2003). alternatives. Structural Equation Modeling, 6(1), 1–55. Evaluating the fit of structural equation models: Tests of Hunter, J. E., & Hamilton, M. (2002). The advantages of using significance and descriptive goodness-of-fit measures. Methods of standardized scores in causal analysis. Human Communication psychological research online, 8(2), 23–74. Research, 28(4), 552–561. https://doi.org/10.1111/j.1468-2958. Steiger, J. (2002). When constraints interact: A caution about reference 2002.tb00823.x variables, identification constraints and scale dependencies in Hunter, J. E., & Schmidt, F. (2015). Methods of meta-analysis: structural equation modeling. Psychological methods, 7(2), 210– correcting error and bias in research findings, (3rd ed.). Thousand Oaks: Sage Publications. van den Boer, M., van Bergen, E., & de Jong, P. (2014). Underlying Jak, S., Oort, F. J., Roorda, D. L., & Koomen, H. (2013). Meta- skills of oral and silent reading. Journal of experimental child analytic structural equation modelling with missing correlations. psychology, 128, 138–151. Netherlands Journal of Psychology, 67(4), 132–139. Viswesvaran, C., & Ones, D. (1995). Theory testing: Combining Jak, S. (2015). Meta-analytic structural equation modeling. Switzer- psychometric meta-analysis and structural equations modeling. land: Springer International Publishing. Personnel Psychology, 48(4), 865–885. https://doi.org/10.1111/j. Jak, S., & Cheung, M. W.-L. (2017). Accounting for missing 1744-6570.1995.tb01784.x correlation coefficients in fixed-effects meta-analytic structural Zakrzewska, J. (2012). Should we still use the hospital anxiety and equation modeling. Multivariate Behavioral Research, in press. depression scale? Pain, 153(6), 1332–1333. https://doi.org/10. Jiang, K., Liu, D., Mckay, P. F., Lee, T. W., & Mitchell, T. R. (2012). 1016/j.pain.2012.03.016 When and how is job embeddedness predictive of turnover? A Zhang, Y. (2011). Meta-analytic Structural Equation Modeling meta-analytic investigation. Journal of Applied Psychology, 97(5), (MASEM): Comparison of the multivariate methods (phdthesis). 1077–1096. https://doi.org/10.1037/a0028610 Fl: The Florida State University. Kwan, J. L. Y., & Chan, W. (2011). Comparing standardized coef- Zigmond, A. S., & Snaith, R. (1983). The hospital anxiety and ficients in structural equation modeling: a model reparameteri- depression scale. Acta Psychiatrica Scandinavica, 67(6), 361– zation approach. Behavior Research Methods, 43(3), 730–745. 370. https://doi.org/10.1111/j.1600-0447.1983.tb09716.x https://doi.org/10.3758/s13428-011-0088-6 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Behavior Research Methods Springer Journals

Testing moderator hypotheses in meta-analytic structural equation modeling using subgroup analysis

Free
15 pages
Loading next page...
 
/lp/springer_journal/testing-moderator-hypotheses-in-meta-analytic-structural-equation-OW72abF8wm
Publisher
Springer US
Copyright
Copyright © 2018 by The Author(s)
Subject
Psychology; Cognitive Psychology
eISSN
1554-3528
D.O.I.
10.3758/s13428-018-1046-3
Publisher site
See Article on Publisher Site

Abstract

Meta-analytic structural equation modeling (MASEM) is a statistical technique to pool correlation matrices and test structural equation models on the pooled correlation matrix. In Stage 1 of MASEM, correlation matrices from independent studies are combined to obtain a pooled correlation matrix, using fixed- or random-effects analysis. In Stage 2, a structural model is fitted to the pooled correlation matrix. Researchers applying MASEM may have hypotheses about how certain model parameters will differ across subgroups of studies. These moderator hypotheses are often addressed using suboptimal methods. The aim of the current article is to provide guidance and examples on how to test hypotheses about group differences in specific model parameters in MASEM. We illustrate the procedure using both fixed- and random-effects subgroup analysis with two real datasets. In addition, we present a small simulation study to evaluate the effect of the number of studies per subgroup on convergence problems. All data and the R-scripts for the examples are provided online. Keywords Meta-analytic structural equation modeling · Two-stage structural equation modeling · Meta-analysis · Random-effects model · Subgroup analysis The combination of meta-analysis and structural equation a pooled correlation matrix with a random- or fixed-effects modeling (SEM) for the purpose of testing hypothesized model. In the second stage of the analysis, a structural models is called meta-analytic structural equation modeling equation model is fitted to this pooled correlation matrix. (MASEM). Using MASEM, correlation matrices from Several alternative models may be tested and compared in independent studies can be used to test a hypothesized this stage. If all variables were measured on a common model that explains the relationships between a set of scale across studies, analysis of covariance matrices would variables or to compare several alternative models that may also be possible (Cheung & Chan, 2009). This would allow be supported by different studies or theories (Viswesvaran researchers to study measurement invariance across studies. &Ones, 1995). The state-of-the-art approach to conducting In this paper we focus on correlation matrices although MASEM is the two-stage SEM (TSSEM) approach the techniques that are discussed are directly applicable to (Cheung, 2014; Cheung & Chan, 2005b). In the first stage covariance matrices. of the analysis, correlation matrices are combined to form Researchers often have hypotheses about how certain parameters might differ across subgroups of studies (e.g., Suzanne Jak was supported by Rubicon grant 446-14-003 from Rosenbusch, Rauch, and Bausch (2013)). However, there the Netherlands Organization for Scientific Research (NWO). are currently no straightforward procedures to test these Mike W.-L. Cheung was supported by the Academic Research hypotheses in MASEM. The aims of the current article are Fund Tier 1 (FY2013-FRC5-002) from the Ministry of Education, therefore: 1) to provide guidance and examples on how to Singapore. test hypotheses about group differences in specific model Suzanne Jak parameters in MASEM; 2) to discuss issues with regard S.Jak@uva.nl to testing differences between subgroups based on pooled 1 correlation matrices; and 3) to show how the subgroup Methods and Statistics, Child Development and Education, University of Amsterdam, Nieuwe Achtergracht 127, 1018 models with equality constraints on some parameters can WS, Amsterdam, The Netherlands be fitted using the metaSEM (Cheung, 2015b)and OpenMx National University of Singapore, Singapore, Singapore packages (Boker et al., 2014) in R (R Core Team, 2017). 1360 Behav Res (2018) 50:1359–1373 Specifically, we propose a follow-up analysis in which One way to account for heterogeneity is by estimat- the equality of structural parameters across studies can be ing between-study heterogeneity across all studies in the tested. Assuming that there are hypotheses on categorical random-effects approach (Case 2 in Table 1). By using a study-level variables, the equality of specific parameters random-effects model, the between-study heterogeneity is can be tested across subgroups of studies. In this way, accounted for at Stage 1 of the analysis (pooling correla- it is possible to find a model in which some parameters tions), and the Stage 2 model (the actual structural model are equal across subgroups of studies and others are not. of interest) is fitted on the averaged correlation matrix. More importantly, it helps researchers to identify how study- Under the random-effects model, study-level variability is level characteristics can be used to explain differences in considered a nuisance. An overall random-effects analysis parameter estimates. may be the preferred choice when moderation of the effects by study-level characteristics is not of substantive interest (Cheung & Cheung, 2016). Methods to model heterogeneity Subgroup analysis is more appropriate than overall in meta-analysis random-effects analysis in cases where it is of interest to determine how the structural models differ across levels With regard to how to handle heterogeneity in a meta- of a categorical study-level variable, (Cases 3 and 4 in analysis, two dimensions (or approaches) can be distin- Table 1). In a subgroup analysis, the structural model is guished (e.g., Borenstein, Hedges, Higgins, and Rothstein fitted separately to groups of studies. Within the subgroups, (2009)). The first dimension concerns whether to apply a one may use random- or fixed-effects modeling (Jak, 2015). fixed- or a random-effects model, while the second dimen- Fixed-effects subgroup analysis is suitable if homogeneity sion is about whether or not to include study-level mod- of correlations within the subgroups is realistic. Most often, erators. Two classes of models can be differentiated: the however, heterogeneity within subgroups of studies is still fixed-effects model and the random-effects model. The expected, and fixed-effects modeling may be unrealistic. fixed-effects model allows conditional inference, meaning In such cases, random-effects subgroup analysis may be that the results are only relevant to the studies included the best choice. A possible problem with a random-effects in the meta-analysis. The random-effects model allows for subgroup analysis is that the number of studies within each unconditional inference to studies that could have been subgroup may become too small for reliable results to be included in the meta-analysis by assuming that the included obtained. studies are samples of a larger population of studies (Hedges We focus on the situation in which researchers have & Vevea, 1998). an apriori idea of which study-level characteristics may The fixed-effects model (without moderators) usually moderate effects in the Stage 2 model. That is, we do assumes that all studies share the same population effect not consider exploratory approaches, such as using cluster size, while the fixed-effects model with moderators assumes analysis to find homogeneous subgroups of studies (Cheung that the effects are homogeneous after taking into account & Chan, 2005a). the influence of moderators. The random-effects model Besides the random-effects model and subgroup anal- assumes that the differences across studies are random. The ysis, Cheung and Cheung (2016) discuss an alternative random-effects model with moderators, known as a mixed- approach to addressing heterogeneity in MASEM, called effects model, assumes that there will still be random effects “parameter-based MASEM”. Since this approach also has after the moderators are taken into account. its limitations, and discussing them is beyond the scope of the current work, we refer readers to their study for more Methods to model heterogeneity in MASEM details. We focus on TSSEM, in which subgroup analysis is the only option to evaluate moderator effects. The above framework from general meta-analysis is also applicable to MASEM. Table 1 gives an overview of the Currently used methods to test hypotheses about suitability, and the advantages and disadvantages of using heterogeneity in MASEM different combinations of fixed- versus random-effects MASEM, with or without subgroups. Case 1 represents A disadvantage of the way subgroup analysis is commonly overall analysis with a fixed-effects model. Fixed-effects applied, is that all Stage 2 parameters are allowed to be models are very restrictive, (i.e. the number of parameters to different across subgroups, regardless of expectations about be estimated is relatively small), which makes them easy to differences in specific parameters. That is, differences in apply. However, homogeneity of correlation matrices across parameter estimates across groups are seldom tested in the studies may not be realistic, leading to biased significance structural model. For example, Rosenbusch et al. (2013) tests (Hafdahl, 2008; Zhang, 2011). performed a MASEM analysis on data from 83 studies, Behav Res (2018) 50:1359–1373 1361 Table 1 Overview of advantages (+) and disadvantages (–) of subgroup versus overall analysis and fixed-effects versus random-effects models FEM REM Case 1 Case 2 Use if: There is no hypothesis about moderation, There is no hypothesis about modera- and homogeneity is realistic tion, and homogeneity is not realistic Overall + 1) Small number of parameters 1) Accounts for heterogeneity 2) Sometimes the only option (e.g. with a 2) Allows for unconditional inference small number of studies) – 1) Only allows for conditional inference 1) Large number of parameters (but smaller than without subgroups) 2) Biased significance tests if homogeneity 2) No information about specific effects of moderators does not hold 3) Masks subgroup differences in parameters 3) Masks subgroup differences in parameters Case 3 Case 4 Use if: There is a specific hypothesis about subgroups, There is a specific hypothesis about subgroups, and homogeneity within subgroups is realistic and homogeneity within subgroups is not realistic Subgroups + 1) Small number of parameters 1) Accounts for additional heterogeneity within subgroups 2) Sometimes the only option (e.g. with a 2) Allows for unconditional inference small number of studies) 3) Posibility to test subgroup differ- 3) Posibility to test subgroup differences in parameters ences in parameters – 1) Only allows for conditional inference 1) Large number of parameters (larger than without subgroups) 2) Need to dichotomize continuous moderator 2) Need to dichotomize continuous moderator 2) Biased parameter estimates if 3) Number of studies per subgroup might get too small homogeneity does not hold testing a model in which the influence of the external on the four pooled Stage 1 correlation coefficients in environment of firms on performance levels is mediated by the subgroups, ignoring the estimates in the actual path the entrepreneurial orientation of the firm. They split the models altogether. These approaches are not ideal because data into a group of studies based on small sized firms researchers cannot test whether some of the parameters, and medium-to-large sized firms, to investigate whether the those that may be of theoretical interest, are significantly regression parameters in the path model are moderated by different across groups. firm size. However, after fitting the path model to the pooled More often than using subgroup analysis, researchers correlation matrices in the two subgroups, they compared address the moderation of effect sizes using standard meta- the results without using any statistical tests. analysis techniques on individual effect sizes, before they Gerow et al. (2013) hypothesized that the influence conduct the MASEM analysis. They use techniques such of intrinsic motivation on individuals’ interaction with as meta-regression or ANOVA-type analyses (Lipsey & information technology was greater when the technology Wilson, 2001). Independent of the moderation effects, was to be used for hedonistic applications than for practical the MASEM is then performed using the full set of applications. They fitted the structural model to a subgroup studies. Examples of this practice can be found in Drees of studies with hedonistic applications, a subgroup of and Heugens (2013), Earnest, Allen, and Landis (2011) studies with practical applications, and a subgroup of and Jiang, Liu, Mckay, Lee, and Mitchell (2012). A studies with a mix of applications. However, to test for disadvantage of this approach is that moderation is tested on differences between the subgroups, they performed t-tests the correlation coefficients, and not on specific parameters 1362 Behav Res (2018) 50:1359–1373 in a structural equation model. Most often, this is not in line Stage 1 with the hypothesis of interest. For example, the moderator hypotheses of Gerow et al. (2013), were about the direct In fixed-effects TSSEM, the correlation matrices in the effects in the path model but not about covariances and individual studies are assumed to be homogenous across variances. Although subgroup analysis to test heterogeneity studies, all being estimates of one common population has previously been conducted (see Haus et al. 2013), we correlation matrix. Differences between the correlation think that instructions regarding the procedures are needed matrices in different studies are assumed to be solely the because most researchers who apply MASEM still choose result of sampling error. The model that is fitted at Stage 1 to address issues of moderation outside the context of is a multigroup model in which all correlation coefficients MASEM. are assumed to be equal across studies. Fitting this model to the observed correlation matrices in the studies leads to an estimate of the population correlation matrix P ,which is Overview of this article correctly estimated if homogeneity indeed holds. In the next sections, we briefly introduce fixed- and random- Stage 2 effects TSSEM and propose a follow-up analysis to address heterogeneity using subgroup analysis. We discuss some In Stage 2 of the analysis, weighted least squares (WLS) issues related to testing the equality of parameters using estimation (Browne, 1984) is used to fit a structural equation model to the estimated common correlation matrix from Stage pooled correlation matrices. Next, we illustrate the proce- dure using an example of testing the equality of factor load- 1. The proposed weight matrix in WLS-estimation is the ings across study-level variables of the Hospital Anxiety and inverse asymptotic variance covariance matrix of the Stage 1 −1 Depression Scale (HADS) with data from Norton, Cosco, estimates of P , i.e., W = V (Cheung & Chan, 2005b). F F Doyle, Done, and Sacker (2013) as well as with an exam- These weights ensure that correlation coefficients that are ple of testing moderation by socio-economic status (SES) in based on more information (on more studies and/or studies a path model linking teacher-child relations to engagement with larger sample sizes) get more weight in the estimation and achievement (Roorda, Koomen, Spilt, & Oort, 2011). of the Stage 2 parameters. The Stage 2 analysis leads to To facilitate the use of the proposed procedure, detailed estimates of the model parameters and a χ measure of fit. reports of the analyses, including data and R-scripts, are pro- vided online at www.suzannejak.nl/masem code. Finally, Random-effects TSSEM we present a small simulation study to evaluate the effect of the number of studies included in a MASEM analysis on the Stage 1 frequency of estimation problems. In random-effects TSSEM, the population effects sizes are allowed to differ across studies. The between-study TSSEM variability is taken into account in the Stage 1 analysis. Estimates of the means and the covariance matrices in In the next two sections we briefly describe fixed-effects random-effects TSSEM are obtained by fixing the sampling TSSEM and random-effects TSSEM. For a more elaborate covariance matrices to the known values (through definition explanation see Cheung and Chan (2005b), Cheung (2014), variables, see Cheung (2015a), and using full information Cheung (2015a), and Jak (2015). maximum likelihood to estimate the vector of means, P , and the between-studies covariances, T (Cheung, 2014). Fixed-effects TSSEM Stage 2 The fixed-effects TSSEM approach was proposed by Cheung and Chan (2005b). They performed a simulation Fitting the Stage 2 model in the random-effects approach is study, comparing the fixed-effects TSSEM approach to two not very different from fitting the Stage 2 model in the fixed- univariate approaches (Hunter & Schmidt, 2015; Hedges & effects approach. The values in W from a random-effects Olkin, 1985) and the multivariate GLS-approach (Becker, analysis are usually larger than those obtained from a fixed- 1992, 1995). They found that the TSSEM approach showed effects analysis, because the between-studies covariance is the best results with respect to parameter accuracy and false added to the construction of the weight matrix. This results positive rates of rejecting homogeneity. in relatively more weight being given to smaller studies, and Behav Res (2018) 50:1359–1373 1363 larger standard errors and confidence intervals, than with the to the two matrices. For example, one could fit a factor fixed-effects approach. model in both groups: P =    +  , (1) g g g g Using subgroup analysis to test parameter heterogeneity where with p observed variables and k common factors, is a full p by k matrix with factor loadings in The basic procedure for subgroup analysis comprises sep- group g,  is a k by k symmetrical matrix with factor arate Stage 1 analyses for the subgroups. The Stage 1 variances and covariances in group g,and  is a p by p analyses may be in the fixed-effects framework, hypothesiz- symmetrical matrix with residual (co)variances in group g. ing homogeneity within subgroups, or in the random-effects The covariance structure is identified by setting diag( ) framework, assuming that there is still substantive between- = I. Since the input is a correlation matrix, the constraint study heterogeneity within the subgroups. In a subgroup diag(P )= I, is required to ensure that the diagonals of P g g MASEM analysis, it is straightforward to equate certain are always ones during estimation. parameters across groups at Stage 1 or Stage 2 of the analy- In order to test the equality of factor loadings across sis. The differences in the parameters across groups can be groups, a model can be fitted in which  = . Under tested using a likelihood ratio test by comparing the fit of the null hypothesis of equal factor loadings, the difference a model with across-groups equality constraints on certain in chi-squares of the models with  =  and  = g g g parameters with a model in which the parameters are freely  asymptotically follows a chi-square distribution with estimated across groups. degrees of freedom equal to the difference in the number of freely estimated parameters. If the difference in chi-squares Testing heterogeneity in Stage 1 parameters is considered significant, the null hypothesis of equal factor loadings is rejected. Although we focus on testing differences in Stage 2 parame- The approach of creating subgroups with similar study ters, in some situations it may be interesting to test the equal- characteristics and equating parameters across groups is ity of the pooled correlation matrices across subgroups. In suitable for any structural equation model. For example, order to test the hypothesis that the correlation matrices in a path model, it may be hypothesized that some or all from a fixed-effects subgroup analysis, P , are equal across direct effects are different across subgroups of studies, but subgroups g, one could fit a model with the constraint P variances and residual variances are not. One could then g1 = P . Under the null hypothesis of equal correlation matri- compare a model with equal regression coefficients with a g2 ces across groups, the difference in the -2 log-likelihoods of model with freely estimated regression coefficients to test the models with and without this constraint asymptotically the hypothesis. Also, the subgroups approach can be applied follows a chi-square distribution with degrees of freedom using fixed-effects or random-effects analyses. equal to the number of constrained correlation coefficients. Similarly, one could perform this test on the averaged corre- Issues related to testing equality constraints based lation matrices from a random-effects Stage 1 analysis. With on correlation matrices in TSSEM random-effects analysis, it may additionally be tested if the subgroups differ in their heterogeneity covariance matrices Structural equation models are ideally fitted on covariance T . When the researcher’s hypotheses are directly about matrices. In MASEM, and meta-analysis in general, it is Stage 2 parameters, one may skip testing the equality of very common to synthesize correlation coefficients. One equal correlation matrices across subgroups. The equality reason for the synthesis of standardized effect sizes is of between-studies covariance matrices may still be useful that different studies may use different instruments with to reduce the number of parameters to be estimated in a different scales to operationalize the variables of interest. random-effects analysis. This issue is discussed further in The analysis of correlation matrices does not pose problems the general discussion. when the necessary constraints are included (Bentler & Savalei, 2010; Cheung, 2015a). However, it should be taken Testing heterogeneity in Stage 2 parameters into account that fitting models to correlation matrices with TSSEM implies that all parameter estimates are in a For ease of discussion, we suppose that there are two sub- standardized metric (assuming that all latent variables are groups. Given the two Stage 1 pooled correlation matrices scaled to have unit variances, which is recommended in in the subgroups g,say, P , a structural model can be fitted TSSEM (Cheung, 2015a)). g 1364 Behav Res (2018) 50:1359–1373 When we compare models across subgroups in TSSEM, it is recommended that the common factors be identified we are thus comparing parameter estimates that are stan- by fixing their variances to 1 (Cheung, 2015a). All results dardized with respect to the observed and latent variables obtained from a MASEM-analysis on correlation matrices within the subgroups (Cheung, 2015a; Steiger, 2002). This are thus standardized with respect to the observed variables may not necessarily be a problem - sometimes it is even and the common factor. As a consequence of this standard- desirable to compare standardized coefficients (see Kwan ization, the residual variances in  are effectively not free and Chan (2011)). For example, van den Boer, van Bergen, parameters, but the remainder of diag(I) − diag( ) and de Jong (2014) tested the equality of correlations (Cheung, 2015a). between three reading tasks across an oral and a silent read- Similar to path analysis, when testing the equality of ing group. However, it is important to be aware of this issue factor loadings across subgroups in MASEM, the results and to interpret the results correctly. Suppose that a stan- may not be generalizable to unstandardized factor loadings, dardized regression coefficient from variable x on variable y due to across-group differences in the (unknown) variances β , is compared across two subgroups of studies, g and g . of the indicators and common factors. Moreover, if all 1 2 yx The standardized direct effects in the subgroups are given standardized factor loadings are set to be equal across by: groups, this implies that all standardized residual variances σ are equal across groups. Note that although one may be g1 β = β (2) yx yx g1 g1 inclined to denote a test of the equality of factor loadings g1 a test of weak factorial invariance (Meredith, 1993), this and would strictly be incorrect, as weak factorial invariance g2 pertains to the equality of unstandardized factor loadings. β = β , (3) yx yx g2 g2 g2 where β represents an unstandardized regression coeffi- Examples cient, β represents a standardized regression coefficient, and σ represents a standard deviation. In the special case In this section, we present two examples of the testing of that the standard deviations of x and y are equal within subgroups, in each subgroup the standardized coefficient is moderator hypotheses in MASEM using subgroup analysis. Example 1 illustrates the testing of the equality of factor equal to the unstandardized coefficient, and the test of H : ∗ ∗ β = β is equal to the test of H : β = β .In loadings using factor analysis under the fixed-effects model 0 yx yx yx yx g1 g2 g1 g2 (Case 1 and 3 from Table 1). Example 2 illustrates the fact, this not only holds when the standard deviations of the testing of the moderation of direct effects using path variables are equal in the subgroups, but in general when the ratio of σ over σ is equal across subgroups. For exam- analysis under the random-effects model (Case 2 and 4 from x y Table 1). The R-syntax for the examples can be found online ple, when σ and σ in group 1 are respectively 2 and 4, x y and the σ and σ in group 2 are respectively 1 and 2, the (http://www.suzannejak.nl/masem code). x y standardized regression coefficient equals the unstandard- ized coefficient times .5 in both groups. In this case, a test of Example 1 – Testing equality of factor loadings of the Hospital Anxiety and Depression Scale the equality of the standardized regression coefficients will lead to the same conclusion as a test of the unstandardized Introduction regression coefficients. However, in most cases the ratio of standard deviations The HADS was designed to measure psychological dis- will not be exactly equal across groups. Therefore, when testing the equality of regression coefficients in a path tress in non-psychiatric patient populations (Zigmond & model, one has to realize that all parameters are in a stan- Snaith, 1983), and is widely used in research on distress dardized metric. The conclusions may not be generalizable in patients. The instrument consists of 14 items: the odd to unstandardized coefficients. Whether the standardized numbered items are designed to measure anxiety and the even numbered items are designed to measure depression. or the unstandardized regression coefficients are more rel- evant depends on the research questions (Bentler, 2007). The items are scored on a 4-point scale. Some controversy exists regarding the validity of the HADS (Zakrzewska, In the context of meta-analysis, standardized coefficients are generally preferred (Cheung, 2009; Hunter & Hamilton, 2012). The HADS has generally been found to be a useful instrument for screening purposes, but not for diagnostics 2002). In a factor analytic model, several methods of standard- purposes (Mitchell, Meader, & Symonds, 2010). Ambigu- ous results regarding the factor structure of the HADS ization exist. Parameter estimates may be standardized with respect to the observed variables only, or with respect to led to a meta-analytic study by Norton et al. (2013), who the observed variables and common factors. In MASEM, gathered correlation matrices of the 14 HADS items from 28 Behav Res (2018) 50:1359–1373 1365 published studies. Using meta-analytic confirmatory factor samples were tested. If the equality constraints on the factor analysis, they found that a bi-factor model that included all loadings led to a significantly higher chi-square statistic, the items loading onto a general distress factor and two orthog- (standardized) factor loadings would be considered to differ onal anxiety and depression factors provided the best fit to across groups. the pooled data. Of the 28 studies evaluated by Norton et Exact fit of a proposed model is rejected if the χ statistic al., 10 considered non-patient samples and 18 were based is found to be significant. Exact fit will rarely hold in on patient samples. As an illustration we will test the equal- MASEM, due to the large total sample size. Therefore, ity of factor loadings across studies based on patient and as in standard SEM, it is common to use approximate non-patient samples. fit to assess the fit of models. Approximate close fit is associated with RMSEA-values under .05, satisfactory Analysis approximate fit with RMSEA-values under .08, and bad approximate fit is associated with RMSEA-values larger All of the models were fitted using the metaSEM, and than .10 (MacCallum, Browne, and Sugawara, 1996). In OpenMx packages in the R statistical platform. First we fit addition to the RMSEA, we will evaluate the CFI (Bentler, the Stage 1 and Stage 2 models with a fixed-effects model 1990) and the standardized root mean squared residual to the total set of studies (illustrating Case 1 from Table 1). (SRMR). CFI-values above .95 and SRMR-values under .08 The stage 1 analysis using the fixed-effects model involved are considered satisfactory (Hu & Bentler, 1999). For more fitting a model to the 28 correlation matrices in which all information about the calculation and use of fit-indices in SEM we refer to Schermelleh-Engel et al. (2003). correlation coefficients were restricted to be equal across studies. Misfit of this model would indicate inequality of the correlation coefficients across studies. Stage 2 involved Results fitting the bi-factor model that Norton et al. (2013) found to have the best fit to the data (see Fig. 1). Overall Stage 1: Testing homogeneity and pooling correla- Next, two subgroups of studies were created, one group tion matrices The Stage 1 model did not have exact fit to with the 10 non-patient samples and the other with the 18 the data, χ (2,457) = 10,400.04, p <.01. Approximate fit patient samples (illustrating Case 3 from Table 1). First, was acceptable according to the RMSEA (.064, 95% CI: the Stage 1 analyses were performed in the two groups [.063 ; .066]), but not according to the CFI (.914) and SRMR separately, leading to two pooled correlation matrices. (.098). Based on the CFI and SRMR, one should not con- Then, the factor model without equality constraints across tinue to fit the structural model, or use random-effects mod- eling. However, in order to illustrate the modeling involved subgroups was fitted to the data. Next, three models in which the factor loadings of the general distress factor, in Case 1, we will continue with Stage 2 using overall fixed- anxiety factor and depression factor respectively were effects analysis. Table 2 shows the pooled correlation matrix constrained to be equal across patient and non-patient based on the fixed-effects Stage 1 analysis. General Distress Anxiety Depression 1,1 28,1 λ 27,1 27,2 2,1 1,2 28,3 2,3 Item 1 Item 27 Item 2 Item 28 …... …... θ θ θ θ 27,27 2,2 28,28 1,1 Fig. 1 The bi-factor model on the HADS-items 1366 Behav Res (2018) 50:1359–1373 Table 2 Pooled correlation matrix based on the fixed effects Stage 1 analysis of the HADS data v1 v3 v5 v7 v9 v11 v13 v2 v4 v6 v8 v10 v12 v14 v1 1 v3 .48 1 v5 .55 .52 1 v7 .42 .36 .41 1 v9 .42 .46 .42 .35 1 v11 .33 .29 .33 .32 .28 1 v13 .49 .54 .50 .36 .50 .37 1 v2 .29 .24 .30 .34 .25 .18 .26 1 v4 .29 .28 .32 .36 .27 .18 .28 .42 1 v6 .40 .36 .43 .40 .31 .22 .36 .38 .45 1 v8 .35 .30 .34 .28 .27 .23 .33 .36 .25 .33 1 v10 .23 .21 .25 .22 .18 .17 .22 .25 .26 .30 .26 1 v12 .30 .27 .32 .36 .28 .19 .29 .47 .46 .42 .32 .33 1 v14 .24 .22 .25 .34 .22 .21 .25 .28 .31 .31 .19 .21 .33 1 Overall Stage 2: Fitting a factor model to the pooled Subgroup Stage 1: Testing homogeneity and pooling cor- correlation matrix Norton et al. (2013) concluded that a bi- relation matrices In the patient group, homogeneity was factor model showed the best fit to the data. We replicated rejected by the chi-square test (χ (1,547) = 5,756.84, p the analyses and found that, indeed, the model fit is <.05). Homogeneity could be considered to hold approxi- acceptable according to the RMSEA (χ (63) = 2,101.48, mately, based on the RMSEA (.071, 95% CI: [.070 ; .073]), RMSEA = .039, 95% CI RMSEA: [.037 ; .040], CFI = .953, but not based on the CFI (.923) and SRMR (.111). In the SRMR = .033). The parameter estimates from this model non-patient group, homogeneity was also rejected by the can be found in Table 3. All items loaded substantially on chi-square test, χ (819) = 3,254.60, p <.05, but approxi- the general factor, and most items had smaller loadings on mate fit could be considered acceptable based on the RMSEA the specific factor. Contrary to expectations, Item 7 has a and SRMR (RMSEA = .049, 95% CI RMSEA: [.048 ; .051], negative loading on the anxiety factor. CFI = .941, SRMR = .062). Although the model with a Table 3 Parameter estimates and 95% confidence intervals from the bi-factor model on the total HADS data General  Anxiety  Depression est. lb ub est. lb ub est. lb ub est. lb ub v1 .69 .68 .70 .19 .17 .22 .48 .47 .50 v3 .61 .60 .62 .40 .38 .42 .47 .45 .48 v5 .71 .70 .72 .23 .21 .26 .45 .44 .46 v7 .71 .70 .72 −.13 −.16 −.09 .48 .45 .50 v9 .56 .54 .57 .33 .31 .36 .58 .57 .59 v11 .48 .46 .49 .12 .10 .15 .76 .75 .77 v13 .63 .62 .64 .45 .42 .47 .40 .39 .42 v2 .47 .46 .48 .47 .45 .48 .56 .55 .57 v4 .50 .48 .51 .44 .42 .45 .56 .55 .58 v6 .61 .60 .63 .29 .28 .31 .54 .52 .55 v8 .50 .49 .52 .21 .19 .23 .70 .69 .71 v10 .37 .35 .38 .27 .25 .29 .79 .78 .80 v12 .50 .48 .51 .53 .51 .55 .47 .46 .49 v14 .43 .42 .44 .23 .21 .25 .76 .75 .77 Note: est = parameter estimate, lb = lower bound, ub = upper bound,  General,  Anxiety and  Depression refer to the factor loadings associated with these factors,  refers to residual variance Behav Res (2018) 50:1359–1373 1367 common correlation matrix does not have acceptable fit in standard deviations were not available for most of the the patient group, indicating that not all heterogeneity is included studies. explained by differentiating patient and non-patient sam- We used fixed-effects overall and subgroup analysis, ples, we continue with Stage 2 analysis as an illustration of although homogeneity of correlation matrices did not hold. the procedure when the interest is Case 2 (see Table 1). Therefore, it would have been more appropriate to apply random-effects analysis. However, due to the relatively Subgroup Stage 2: Testing equality of factor loadings The large number of variables and the small number of studies, a fit of the models with freely estimated factor loadings and random-effects model did not converge to a solution. Even with equality constraints on particular sets of factor loadings the most restrictive model with only a diagonal T that was can be found in Table 4. The RMSEAs of all models set to be equal across subgroups did not solve this problem. indicated close approximate fit. However, the χ -difference The results that were obtained should thus be interpreted tests show that the factor loadings cannot be considered with caution, as the Type 1 errors may be inflated. The next equal for any of the three factors. Figure 2 shows a plot of example shows random-effects subgroup-analysis, which the standardized factor loadings in the two groups. For the may be the appropriate framework in most cases. majority of the items, the factor loadings are higher in the Example 2 – Testing moderation of the effect patient group than in the non-patient group. of teacher-student relations on engagement Discussion and achievement We found that the factor loadings of the bi-factor model Introduction on the HADS differed across the studies involving patients versus studies involving non-patients. The items were In this example we use random-effects subgroup analysis generally found to be more indicative of general distress to test moderation by SES in a path model linking teacher- in the studies with patient samples than in the studies with child relations to engagement and achievement. Children non-patient samples. A possible reason for this finding is with low SES are often found to be at risk of failing that the HADS was developed for use in hospital settings, in school and dropping out (Becker & Luthar, 2002). and thus was designed for use with patients. In practice, According to Hamre and Pianta (2001), children at risk of failing in school may have more to gain from an ability researchers may continue with the analysis by testing the equality of individual factor loadings across subgroups. For to adapt to the social environment of the classroom than children who are doing very well at school. Therefore, it can example, the factor loading of Item 2 from the Depression factor seems to differ more across groups than the other be expected that the effects of teacher-child relations may be stronger for children with lower SES. factor loadings for this factor. Such follow-up analyses may give more insight into specific differences across subgroups. Roorda, Koomen, Spilt, and Oort (2011) performed a However, it is advisable to apply some correction on the meta-analysis on correlation coefficients between measures significance level, such as a Bonferroni correction, when of positive and negative teacher-student relations, engage- testing the equality of several parameters individually. ment and achievement. They used univariate moderator anal- A problem with these data is that the HADS is scored on ysis, and found that all correlations were larger in absolute a 4-point scale, but the analysis was performed on Pearson value for studies with relatively more students with low SES. In the current analysis, we will test the moderation of product moment correlations, assuming continuous vari- ables. This may have led to underestimated correlation coef- the specific effects in a path model. We will use 45 studies reported by Roorda et al. (2011) and Jak, Oort, Roorda, and ficients. Moreover, it would have been informative to ana- lyze covariance matrices rather than correlation matrices, Koomen (2013), which include information about SES of the samples. enabling a test on weak factorial invariance. However, the Table 4 Overall fit and difference in fit of the factor model with different equality constraints across groups 2 2 df χ p RMSEA [95% CI] CFI SRMR df χ p 1. No constraints 126 2249.21 <.05 .039 [.038 ; .041] .955 .035 2.  General equal 140 3125.51 <.05 .044 [.043 ; .046] .936 .061 14 876.30 <.05 3.  Anxiety equal 133 2266.14 <.05 .038 [.037 ; .040] .955 .036 7 16.93 <.05 4.  Depression equal 133 2300.62 <.05 .039 [.037 ; .040] .954 .037 7 51.41 <.05 2 2 Note: df and χ refer to the difference in df and χ in comparison with Model 1 1368 Behav Res (2018) 50:1359–1373 Fig. 2 A plot of the estimated factor loadings and 95% confidence intervals for the patient group (red) and non-patient group (grey) Note: We show the absolute value of the factor loading of Item 7 on the Anxiety factor Analysis Results First we will perform a random-effects Stage 1 and Stage 2 Overall Stage 1: Random-effects analysis The pooled corre- analysis on the total sample of studies (representing Case 2 lations based on the random-effects analysis can be found from Table 1). Next, we split the studies into two subgroups in Table 5. When a random-effects model is used, an I based on SES (representing Case 4 from Table 1). We will value may be calculated. It can be interpreted as the pro- fit the hypothesized path model (see Fig. 3) to a group portion of study-level variance in the effect size (Higgins & of studies in which the majority of the respondents were Thompson, 2002). The I values (above the diagonal) show indicated to have low SES (24 studies), and a group of that there is substantial between-studies variability in the studies for which the majority of the sample was indicated correlation coefficients, ranging from .79 to .94. with high SES (21 studies). Note that SES is a continuous moderator variable in this case (percentages). We split the Overall Stage 2: Fitting a path model We fitted a path model studies in two groups based on the criterion of 50% of to the pooled Stage 1 correlation matrix, in which positive and the sample having low SES. Then, we test the equivalence negative relations predicted achievement indirectly, through of the direct effects across groups by constraining the engagement. Exact fit of this model was rejected (χ (2) = effects to be equal across subgroups. Using a significance 11.16, p <.05). However, the RMSEA of .013 (95% CI = level of .05, if the χ statistic increased significantly given [.006 ; .020]) indicated close approximate fit, as well as the the increased degrees of freedom when adding equality CFI (.966) and SRMR (.045). Table 6 shows the parameter constraints across groups, one or more of the parameters estimates and the associated 95% confidence intervals. All would be considered significantly different across groups. parameter estimates were considered significantly different Note that dichotomizing a continuous variable is generally from zero, as zero is not included in the 95% confidence not advised. In this example we dichotomize the moderator intervals. The indirect effects of positive and negative rela- in order to illustrate subgroup analyses. Moreover, in tions on achievement were small, but significant. Although TSSEM, the analysis of continuous moderator variables is the model shows good fit on the averaged correlation not yet well developed. matrix, this analysis provides no information about whether Behav Res (2018) 50:1359–1373 1369 1,1 3,1 4,3 Student engagement Student achievement 2,1 3,2 3,3 ψ 4,4 2,2 Fig. 3 The hypothesized path model for Example 2 SES might explain the between-study heterogeneity. Sub- the three direct effects in the path model to be equal across group analysis is used to test whether the parameters differ subgroups did not lead to a significant increase in misfit, across studies with different levels of average SES. χ (3) = 5.18, p = .16. Therefore, the null hypothesis of equal direct effects across subgroups is not rejected. Subgroup Stage 1: Random-effects analysis Different Discussion In this example we tested whether the direct pooled correlation matrices were estimated in the group of effects in a path model linking teacher-child relations to studies with low SES and the group of studies with high engagement and achievement were moderated by SES. The SES (see Tables 7 and 8). The proportions of between- 2 subgroup analysis showed that the null-hypothesis stating studies variance (I ) within the subgroups are smaller than that the effects are equal in the low SES and high SES they were in the total sample, indicating that SES explains populations cannot be rejected. Note that non-rejection of part of the between-study heterogeneity. a null-hypothesis does not imply that the null-hypothesis is true. It could also mean that our design did not have Subgroup Stage 2: Testing moderation of effects by SES The enough statistical power to detect an existing difference in hypothesized path model showed acceptable approximate the population. fit, but no exact fit, in the low-SES group, χ (2) = 6.28, p <.05, RMSEA = .013 (95% CI = [.002 ; .026]), CFI = .978, SRMR = .041 as well as in the high-SES group, χ (2) Simulation study = 9.50, p <.05, RMSEA = .015 (95% CI = [.006 ; .025]), CFI = .936, SRMR = .0549. The fit of the unconstrained It is often necessary to create subgroups of studies, because baseline model, with which the fit of the models with an overall analysis will mask differences in parameters across equality constraints will be compared, is equal to the sum of the fit of the models in the two subgroups. Therefore, the χ Table 6 Parameter estimates and 95% confidence intervals of the and df against which the constrained models will be tested hypothesized path model is df = 2+2 = 4 and χ = 6.28 + 9.50 = 15.78. Constraining Parameter est lb ub β .27 .20 .35 Table 5 Pooled correlations (under the diagonal) and I (above the 31 diagonal) based on the random effects Stage 1 analysis β −.30 −.38 −.22 β .35 .29 .41 v1 v2 v3 v4 β * β .10 .07 .12 31 43 v1. Positive relations 1 .92 .94 .79 β * β −.10 −.14 −.07 32 43 v2. Negative relations −.24 1 .88 .80 ψ −.24 −.32 −.16 v3. Engagement .32 −.31 1 .90 ψ .80 .73 .85 v4. Achievement .14 −.18 .28 1 ψ .88 .83 .92 44 1370 Behav Res (2018) 50:1359–1373 The data-generating model was based on the results from Table 7 Pooled correlations (under the diagonal) and I (above the diagonal) based on the random effects Stage 1 analysis in studies with Example 2. The population values for the direct effects in low SES Subgroup 1 were: β = .265, β = -.307, β = .288, and 31 32 43 ψ = -.329. The between-studies variance used to generate v1 v2 v3 v4 random correlation matrices was based on Example 2. In v1. Positive relations 1 .85 .94 .71 Subgroup 2, all population values were identical to the v2. Negative relations −.33 1 .83 .73 values in Subgroup 1, except for β , which was .388 (.10 v3. Engagement .35 −.35 1 .86 larger than in Subgroup 1). We generated data with k = 22, v4. Achievement .12 −.18 .23 1 k = 44, k = 66 or k = 88 studies per subgroup, with sample sizes of n=200 for each study. For each condition we generated 2000 meta-analytic datasets. In each condition we fitted the correct model to the two subgroups. For example, if the population regression coeffi- subgroups separately, as well as to the subgroups combined. cient is 0.20 for Subgroup 1, and 0.30 for Subgroup 2, an We restricted the between-studies covariance matrices to be analysis of all of the studies together will result in an esti- diagonal, in order to reduce the number of parameters to mated regression coefficient of between 0.20 and 0.30. This be estimated. In practice, this restriction is often applied means that the effect will be overestimated for Subgroup 1 and (Becker, 2009). We evaluated the percentage of converged underestimated for Subgroup 2. Subgroup analysis will lead solutions, the relative bias in the estimate of β ,andthe to better parameter estimates in the subgroups. However, cre- relative bias in the standard error of β across methods and ating subgroups may lead to small numbers of studies within conditions. The relative percentage of estimation bias for each subgroup. In combination with having twice as many β was calculated as parameters to be estimated as with an overall analysis, small numbers of studies will likely result in estimation problems β − β 43 43 100 ∗ .(4) such as non-convergence. Convergence is an important issue, because researchers will be unable to present any meaningful results of the MASEM analysis without having a converged We regarded estimation bias of less than 5% as acceptable (Hoogland & Boomsma, 1998). The relative percentage of solution. In order to evaluate the effect of the number of studies within each subgroup on the frequency of estimation bias in the standard error of β was calculated as: problems, we conducted a small simulation study. ¯ ˆ ˆ SE(β ) − SD(β ) 43 43 100 ∗ , (5) SD(β ) Data generation and conditions 43 ¯ ˆ ˆ where SE(β ) is the average standard error of β across 43 43 We generated data from two subgroups, in which one regres- replications, and SD(β ) is the standard deviation of the sion coefficient differed by .10 points across subgroups in parameter estimates across replications. We considered the the population. Next, we fitted the correct model to the standard errors to be unbiased if the relative bias was smaller two subgroups separately, as well as to the combined data. than 10% (Hoogland & Boomsma, 1998). We expected that, due to the larger number of studies, the percentage of converged solutions would be larger for the Results overall analysis than for the subgroup analyses and that the estimation bias in the manipulated effect would be smaller Convergence in the subgroup analysis (because the regression coefficient is allowed to be different in each subgroup). Figure 4a shows the convergence rates for all conditions. As expected, the analysis of the total dataset resulted in more converged solutions than the subgroup analysis in all Table 8 Pooled correlations (under the diagonal) and I (above the conditions. In addition, convergence rates increased with diagonal) based on the random effects Stage 1 analysis in studies with the number of studies. However, the convergence rates were high SES generally low. For example, with 22 studies per subgroup v1 v2 v3 v4 (the condition similar to that of our Example 2), only 43% of the datasets led to a converged solution with the v1. Positive relations 1 .90 .84 .79 overall analysis, while only around 30% converged with v2. Negative relations −.17 1 .66 .80 the subgroup analysis. With small numbers of studies per v3. Engagement .23 −.23 1 .87 subgroups (smaller than 44), most analyses are expected to v4. Achievement .16 −.18 .34 1 not result in a converged solution. Behav Res (2018) 50:1359–1373 1371 0.10, the percentages of the relative bias exceeded the cut- off of 5% in all conditions for the overall analysis. For parameters that did not differ across subgroups, all analyses yielded unbiased estimates. Bias in standard errors The relative bias in standard errors was around 10% in all conditions for the overall analysis. With the subgroup anal- ysis, the standard error estimates were more accurate, with a bias of between roughly -5% and 5% in all conditions. The results are presented in Fig. 4c. The standard errors of the parameters that did not differ across subgroups were unbiased for all analyses. Conclusion on the simulation study The simulation study showed that convergence is a serious potential problem when applying random-effects MASEM. Moreover, the likelihood of non-convergence occurring increases with smaller numbers of studies, such as with a subgroup analysis. However, if the model converges, the subgroup analysis will lead to better parameter estimates and standard error estimates in cases where a difference in the population coefficient is present, even if the population difference is small. In order to increase the likelihood of obtaining a converged solution, it is recommended that as many studies as possible be included. General discussion We proposed subgroup analysis to test moderation hypothe- Fig. 4 Convergence, parameter bias and standard error bias for overall ses on specific parameters in MASEM. We illustrated the and subgroup analysis with a group difference of 0.10 in β Note: The results in panels B and C are based on only those replications approach using TSSEM. The subgroup analysis method that that led to a converged solution for all three analyses. The numbers was presented is not restricted to TSSEM. One could just of replications used are 141, 188, 246, and 300 replications for k=22, as easily apply the subgroups analysis on pooled correla- k=44, k=66, and k=88 respectively tion matrices obtained with univariate approaches (Hunter &Schmidt, 2015; Hedges & Olkin, 1985) or the multivari- Bias in parameter estimates ate GLS-approach (Becker, 1992; 1995). However, based on earlier research comparing these approaches (Cheung & We evaluated the parameter bias in β only for. The results Chan, 2005b; Jak & Cheung, 2017), univariate approaches are presented in Fig. 4b. The percentage of estimation bias are not recommended for MASEM. was not related to the number of studies or to sample size. Creating subgroups of studies to test the equality of As expected, the overall analysis resulted in underestimation parameters across groups is a useful approach, but may for Subgroup 1 and overestimation for Subgroup 2, while also lead to relatively small numbers of studies within each the subgroup analysis led to unbiased parameter estimates. subgroup. Given the large number of parameters involved Although the difference in the population value was only in random-effects modeling, the number of studies may become too small for a converged solution to be obtained, Consequently, the numbers of replications used to calculate the bias as was the case in our Example 1. One way to reduce were 141, 188, 246, and 300 of the 2000 replications for k=22, the number of parameters is to estimate the between-study k=44, k=66, and k=88, respectively. We have also calculated the bias heterogeneity variances but not the covariances among using all converged solutions per method (resulting in larger, but the random effects, i.e., restricting T to be diagonal. In different numbers of replications being used for different analyses). This approach leads to very similar results, and identical conclusions. practice, this restriction is often needed (Becker, 2009). We 1372 Behav Res (2018) 50:1359–1373 applied this constraint to the two subgroups in the second Closing the achievement gap. Educational Psychologist, 37(4), 197–214. example and in the simulation study. Becker, B. (1992). Using results from replicated studies to estimate In the simulation study, we found that even with a diag- linear models. Journal of Educational Statistics, 17(4), 341–362. onal heterogeneity matrix, random-effects subgroup mod- 10.2307/1165128 eling is often not feasible due to convergence problems. Becker, B. (1995). Corrections to using results from replicated studies to estimate linear models. Journal of Educational and Behavioral In practice, researchers may therefore have no other option Statistics, 20(1), 100–102. 10.2307/1165390 than to apply fixed-effects modeling instead of random- Becker, B. J. (2009). Model-based meta-analysis. In Cooper, H., effects modeling. However, ignoring between-study hetero- Hedges, L. V., & Valentine, J. C. (Eds.) The handbook of research geneity is known to lead to inflated false positive rates for sig- synthesis and meta-analysis. (2nd edn., pp. 377–395). New York: nificance tests (Hafdahl, 2008; Zhang, 2011). Researchers Russell Sage Foundation. Bentler, P. (1990). Comparative fit indexes in structural models. should therefore be careful when interpreting the results of Psychological Bulletin, 107(2), 238–246. significance tests in cases where heterogeneity exists but a Bentler, P. (2007). Can scientifically useful hypotheses be tested fixed-effects model is applied. Collecting more studies to be with correlations? The American Psychologist, 62(8), 769–782. included in the meta-analysis is preferable over switching to https://doi.org/10.1037/0003-066X.62.8.772 Bentler, P. M., & Savalei, V. (2010). Analysis of correlation structures: a fixed-effects model. Current status and open problems. In Kolenikov, S., Steinley, D., A limitation of the subgroup analysis to test moderation & Thombs L. (Eds.) Statistics in the Social Sciences, (pp. 1–36). is that the moderator variables have to be categorical. New Jersey: Wiley. In the second example, we split the studies into two Boker, S. M., Neale, M. C., Maes, H. H., Wilde, M. J., Spiegel, M., Brick, T. R., & BDBL OpenMx, T. (2014). Openmx 2.0 user guide groups based on the percentage of respondents with [Computer software manual]. high SES in the study. By dichotomizing this variable Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. (2009). we throw away information and lose statistical power. Introduction to meta-analysis. Chichester: Wiley. Indeed, contrary to our findings, the univariate meta- Browne, M. (1984). Asymptotically distribution-free methods for the regression analyses reported by Roorda et al. showed analysis of covariance structures. British Journal of Mathematical and Statistical Psychology, 37(1), 62–83. https://doi.org/10.1111/ significant moderation by SES. However, these analyses j.2044-8317.1984.tb00789.x did not take into account the multivariate nature of Cheung, M. W.-L., & Chan, W. (2005a). Classifying correlation the data, and tested the moderation of the correlation coeffi- matrices into relatively homogeneous subgroups: a cluster analytic cients and not of the regression coefficients. Future research approach. Educational and Psychological Measurement, 65(6), 954–979. https://doi.org/10.1177/0013164404273946 is needed to develop methods to include study-level Cheung, M. W.-L., & Chan, W. (2005b). Meta-analytic structural equa- variables as continuous covariates in TSSEM. tion modeling: A two-stage approach. Psychological Methods, 10(1), 40–64. https://doi.org/10.1037/1082-989X.10.1.40 Concluding remarks Cheung, M. (2009). Comparison of methods for constructing confi- dence intervals of standardized indirect effects. Behavior Research Methods, 41(2), 425–438. https://doi.org/10.3758/BRM.41.2.425 In the current paper we presented a framework to test Cheung, M. W.-L., & Chan, W. (2009). A two-stage approach hypotheses about subgroup differences in meta-analytic to synthesizing covariance matrices in meta-analytic struc- structural equation modeling. The metaSEM and OpenMx- tural equation modeling. Structural Equation Modeling: A code and R-functions used in the illustrations are provided Multidisciplinary Journal, 16(1), 28–53. https://doi.org/10.1080/ online, so that researchers may easily adopt the proposed Cheung, M. (2014). Fixed- and random-effects meta-analytic struc- procedures to test moderator hypotheses in their MASEM tural equation modeling: Examples and analyses inR. Behavior analyses. The simulation study showed that increasing the Research Methods, 46(1), 29–40. https://doi.org/10.3758/s13428- number of studies in a random-effects subgroup analysis 013-0361-y increases the likelihood of obtaining a converged solution. Cheung, M. (2015). Meta-analysis: A structural equation modeling approach. Chichester: Wiley. Cheung, M. W.-L. (2015). metaSEM: An R package for meta-analysis Open Access This article is distributed under the terms of the Creative using structural equation modeling. Frontiers in Psychology, Commons Attribution 4.0 International License (http://creativecommons. 5(1521). https://doi.org/10.3389/fpsyg.2014.01521 org/licenses/by/4.0/), which permits unrestricted use, distribution, and Cheung, M. W.-L., & Cheung, S. (2016). Random-effects models for reproduction in any medium, provided you give appropriate credit to meta-analytic structural equation modeling: Review, issues, and the original author(s) and the source, provide a link to the Creative illustrations. Research synthesis methods, 7(2), 140–155. Commons license, and indicate if changes were made. Drees, J. M., & Heugens, P. P. M. A. (2013). Synthesizing and extending resource dependence theory a meta-analysis. Jour- nal of Management, 39(6), 1666–1698. https://doi.org/10.1177/ References Earnest, D. R., Allen, D. G., & Landis, R. (2011). Mecha- nisms linking realistic job previews with turnover: A meta- Becker, B. E., & Luthar, S. S. (2002). Social-emotional factors analytic path analysis. Personnel Psychology, 64(4), 865–897. affecting achievement outcomes among disadvantaged students: https://doi.org/10.1111/j.1744-6570.2011.01230.x Behav Res (2018) 50:1359–1373 1373 Gerow, J. E., Ayyagari, R., Thatcher, J. B., & Roth, P. L. (2013). Lipsey, M., & Wilson, D. (2001). Practical meta-analysis. Thousand Can we have fun @ work? the role of intrinsic motivation for Oaks: Sage Publications. utilitarian systems. European Journal of Information Systems, MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). 22(3), 360–380. https://doi.org/10.1057/ejis.2012.25 Power analysis and determination of sample size for covariance Hafdahl, A. (2008). Combining heterogeneous correlation matrices: structure modeling. Psychological Methods, 1(2), 130–149. Simulation analysis of fixed-effects methods. Journal of Educa- https://doi.org/10.1037/1082-989X.1.2.130 Meredith, W. (1993). Measurement invariance, factor analysis and tional and Behavioral Statistics, 33(4), 507–533. factorial invariance. Psychometrika, 58(4), 525–543. Hamre, B. K., & Pianta, R. (2001). Early teacher–child rela- Mitchell, A. J., Meader, N., & Symonds, P. (2010). Diagnostic validity tionships and the trajectory of children’s school outcomes of the hospital anxiety and depression scale (HADS) in cancer and through eighth grade. Child Development, 72(2), 625–638. palliative settings: A meta-analysis. Journal of Affective Disor- https://doi.org/10.1111/1467-8624.00301 ders, 126(3), 335–348. https://doi.org/10.1016/j.jad.2010.01.067 Haus, I., Steinmetz, H., Isidor, R., & Kabst, R. (2013). Gender effects Norton, S., Cosco, T., Doyle, F., Done, J., & Sacker, A. (2013). on entrepreneurial intention: A meta-analytical structural equation The hospital anxiety and depression scale: A meta confirmatory model. International Journal of Gender and Entrepreneurship, factor analysis. Journal of Psychosomatic Research, 74(1), 74–81. 5(2), 130–156. https://doi.org/10.1108/17566261311328828 https://doi.org/10.1016/j.jpsychores.2012.10.010 Hedges, L., & Olkin, I. (1985). Statistical methods for meta-analysis. R Core Team (2017). R: A language and environment for statistical Orlando: Academic Press. computing. R Foundation for Statistical Computing. Retrieved Hedges, L., & Vevea, J. (1998). Fixed- and random-effects models in from http://www.R-project.org meta-analysis. Psychological Methods, 3(4), 486–504. Roorda, D. L., Koomen, H. M. Y., Spilt, J. L., & Oort, F. (2011). The Higgins, J. P. T., & Thompson, S. (2002). Quantifying heterogeneity influence of affective teacher-student relationships on students’ in a meta-analysis. Statistics in Medicine, 21(11), 1539–1558. school engagement and achievement: a meta-analytic approach. https://doi.org/10.1002/sim.1186 Review of Educational Research, 81(4), 493–529. Hoogland, J. J., & Boomsma, A. (1998). Robustness studies in Rosenbusch, N., Rauch, A., & Bausch, A. (2013). The mediat- covariance structure modeling an overview and a meta-analysis. ing role of entrepreneurial orientation in the task environment– Sociological Methods & Research, 26(3), 329–367. performance relationship a meta-analysis. Journal of Manage- Hu, L.-t., & Bentler, P. (1999). Cutoff criteria for fit indexes in ment, 39(3), 633–659. https://doi.org/10.1177/0149206311425612 covariance structure analysis: Conventional criteria versus new Schermelleh-Engel, K., Moosbrugger, H., & Muller, ¨ H. (2003). alternatives. Structural Equation Modeling, 6(1), 1–55. Evaluating the fit of structural equation models: Tests of Hunter, J. E., & Hamilton, M. (2002). The advantages of using significance and descriptive goodness-of-fit measures. Methods of standardized scores in causal analysis. Human Communication psychological research online, 8(2), 23–74. Research, 28(4), 552–561. https://doi.org/10.1111/j.1468-2958. Steiger, J. (2002). When constraints interact: A caution about reference 2002.tb00823.x variables, identification constraints and scale dependencies in Hunter, J. E., & Schmidt, F. (2015). Methods of meta-analysis: structural equation modeling. Psychological methods, 7(2), 210– correcting error and bias in research findings, (3rd ed.). Thousand Oaks: Sage Publications. van den Boer, M., van Bergen, E., & de Jong, P. (2014). Underlying Jak, S., Oort, F. J., Roorda, D. L., & Koomen, H. (2013). Meta- skills of oral and silent reading. Journal of experimental child analytic structural equation modelling with missing correlations. psychology, 128, 138–151. Netherlands Journal of Psychology, 67(4), 132–139. Viswesvaran, C., & Ones, D. (1995). Theory testing: Combining Jak, S. (2015). Meta-analytic structural equation modeling. Switzer- psychometric meta-analysis and structural equations modeling. land: Springer International Publishing. Personnel Psychology, 48(4), 865–885. https://doi.org/10.1111/j. Jak, S., & Cheung, M. W.-L. (2017). Accounting for missing 1744-6570.1995.tb01784.x correlation coefficients in fixed-effects meta-analytic structural Zakrzewska, J. (2012). Should we still use the hospital anxiety and equation modeling. Multivariate Behavioral Research, in press. depression scale? Pain, 153(6), 1332–1333. https://doi.org/10. Jiang, K., Liu, D., Mckay, P. F., Lee, T. W., & Mitchell, T. R. (2012). 1016/j.pain.2012.03.016 When and how is job embeddedness predictive of turnover? A Zhang, Y. (2011). Meta-analytic Structural Equation Modeling meta-analytic investigation. Journal of Applied Psychology, 97(5), (MASEM): Comparison of the multivariate methods (phdthesis). 1077–1096. https://doi.org/10.1037/a0028610 Fl: The Florida State University. Kwan, J. L. Y., & Chan, W. (2011). Comparing standardized coef- Zigmond, A. S., & Snaith, R. (1983). The hospital anxiety and ficients in structural equation modeling: a model reparameteri- depression scale. Acta Psychiatrica Scandinavica, 67(6), 361– zation approach. Behavior Research Methods, 43(3), 730–745. 370. https://doi.org/10.1111/j.1600-0447.1983.tb09716.x https://doi.org/10.3758/s13428-011-0088-6

Journal

Behavior Research MethodsSpringer Journals

Published: Jun 4, 2018

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off