Abstract Digital inequalities undermine the democratizing potential of the Internet. While many people engage in public discourse through participatory media, knowledge gaps limit engagement in the networked public sphere. Participatory web platforms have unique potential to facilitate a more equitable production of knowledge. This paper conceptualizes a pipeline of online participation and models the awareness and behaviors necessary to become a contributor to the networked public sphere. We test the theory with the case of Wikipedia editing, relying on survey data from a diverse, national sample of U.S. adults. Our findings underscore the multidimensionality of digital inequalities and suggest new pathways toward closing knowledge gaps by highlighting the importance of education and Internet skills for online stratification processes. Influential early accounts of the participatory Internet and networked media systems proclaimed their potential to democratize knowledge sharing (e.g., Benkler, 2006; Jenkins, 2006) and close long-standing “knowledge gaps” (Tichenor, Donohue, & Olien, 1970). These claims appear partially prescient, as online activities have become ever more ubiquitous spaces of social interaction that shape critical life-course trajectories and outcomes (DiMaggio & Bonikowski, 2008; Hampton, Lee, & Her, 2011; Hargittai & Shaw, 2013). However, scholars have also suggested that the Internet may expand knowledge gaps (Bonfadelli, 2002). Pervasive and durable digital inequalities mean that online environments often exacerbate existing patterns of social exclusion (Robinson et al., 2015). Stratification in Internet experiences, awareness, and skills compound and reinforce each other, contributing to a “Matthew effect” despite the increasing availability of new online activities and interfaces (van Deursen, Helsper, Eynon, & van Dijk, 2017; Pearce & Rice, 2013; Zillien & Hargittai, 2009). As a result, participatory digital spaces present critical domains for understanding whether communication in the digitally networked information environment perpetuates knowledge gaps. Prior research has provided neither theories nor empirical analyses of specific processes by which participation divides in public communication of knowledge emerge, whether online or off. From writing newspaper editorials (Perrin & Vaisey, 2008) to posting photos and videos online (Schradie, 2011), scholarship has described how some people share their ideas publicly while others do not, but has not addressed why the divergent patterns exist. Widespread participation in public discourse underpins the effective functioning of the “networked public sphere” and democratic civil society (Benkler, 2006; Friedland, Hove, & Rojas, 2006). However, the exceptional scale and participatory affordances of digitally networked interactive communication platforms have not eliminated participation gaps in engagement with knowledge in society (Blank, 2013; Cacciatore, Scheufele, & Corley, 2014; Hargittai & Walejko, 2008; Robinson et al., 2015; Schradie, 2011). Understanding the mechanisms of inequalities in online participation around knowledge production is an important task for communication scholarship. Explaining these phenomena requires new approaches to digital inequality. Many earlier studies either focus on inequalities of Internet access or the relationships between demographic backgrounds and what people do online in the forms of information seeking and interpersonal communication (see Robinson et al., 2015 for a review). While some work has explored inequalities in active engagement (see Brake  and Hargittai & Jennrich  for reviews of this research), the mechanisms whereby Internet users go from having access to the web to becoming more active participants in public online spaces have remained unexplored. Our paper contributes a more rigorous investigation of the factors and processes that shape online knowledge-production activities. Expanding prior approaches to digital inequality and knowledge gaps, we theorize and model a pipeline of online participation. This approach incorporates distinct steps heretofore unaddressed in the literature that are prior and necessary to becoming a contributor to the participatory web. We use a unique survey dataset collected in summer 2016 from a diverse, national sample of 1,512 U.S. adults to test our model empirically on the case of Wikipedia, arguably the most widely accessed and impactful example of participatory knowledge production online (comScore, 2016). While our empirical focus is on Wikipedia and the U.S., the theoretical approach can extend to other online participatory activities and contexts, such as the comment threads of major news media sites like the BBC or the Washington Post, social media platforms like Facebook or Twitter, as well as sites dedicated to particular engagement activities like online health communities or political volunteering. Overall, the results illustrate that knowledge gaps help explain inequalities in knowledge production activities online. We find that the pipeline metaphor characterizes the data accurately and that different factors explain engagement at stages of the pipeline in ways that previous research had not considered. Participation on Wikipedia reflects inequalities along background attributes, including (but not limited to) education, Internet experiences and skills, and gender. The paper illuminates digital stratification processes and the knowledge gap in the digital age both by elaborating the metaphor of a pipeline of participation and by operationalizing it in relation to an important site of knowledge production in the networked public sphere. Our diverse, national sample supports more generalizable inferences about participation gaps. We conclude that future research and interventions to overcome digital participation gaps should not focus exclusively on gender or class differences in content creation, but expand their scope to multiple aspects of digital inequality across pipelines of participation. In particular, our findings support broader efforts to address knowledge- and skill-based barriers to entry among potential contributors to the networked public sphere, from online health communities to political discussions, from cultural conversations to job-search forums. Knowledge gaps, digital inequalities, and online participation According to the “knowledge gap hypothesis” (Tichenor et al., 1970), differential media access and skills compound differential levels of education, expanding inequalities of knowledge and engagement with important matters of public concern. Scholars have elaborated, tested, and debated this claim around print and broadcast media for decades, drawing mixed conclusions (e.g., Ettema & Kline, 1977; Hwang & Jeong, 2009). Contemporary discussions of inequalities and stratification processes in digital media environments build on the knowledge gap debates. Inequalities in Internet use and online media engagement demonstrate a knowledge gap dynamic at work (e.g., Bonfadelli, 2002; Cacciatore et al., 2014; Pearce & Rice, 2013). Key determinants of digital inequalities also include, autonomy of use, experiences, and skills (Chen & Wellman, 2004; van Deursen et al., 2017; Dutton & Blank, 2014; Haight, Quan-Haase, & Corbett, 2014; Hargittai, 2002; Livingstone & Helsper, 2007; Martínez-Cantos, 2017; Robinson et al., 2015; Warschauer, 2003). Knowledge- and content-production activities represent an arena where digital inequalities play an especially large role in shaping individual activities and attainment (Brake, 2014). Emancipatory accounts of networked communication emphasize the potential of the participatory Internet to empower individuals to become active producers of knowledge and cultural resources (rather than simply consumers; Benkler, 2006; Jenkins, 2006). However, digital inequality research has challenged such one-sided enthusiasm with evidence that participation in online content production remains highly differentiated across multiple sites, activities, and places (e.g., Blank, 2013; Correa, 2010; Hargittai & Walejko, 2008; Schradie, 2011, 2015). Internet-use skills stand out among the most consistent predictors of differentiated online content production activities. For example, women and low–socio-economic status individuals tend to produce less content online than men and high-socioeconomic status individuals (Schradie, 2011, 2015), but differences in Internet skills help explain the gendered variations in multivariate models (Correa, 2010; Hargittai & Shaw, 2015; Hargittai & Walejko, 2008; Martínez-Cantos, 2017). We seek to expand the theoretical and empirical literature on digital inequality and participation gaps by better understanding how factors such as skills may explain differentiated online content production. Theorizing online content production gaps as pipelines Stratification processes in the online media systems where knowledge and content production happen remain less understood than general patterns of Internet use. Prior theoretical work posits that even once people bridge the first-order digital divide and connect to the Internet, inequalities in their online behaviors will still exist (DiMaggio, Hargittai, Celeste, & Schafer, 2004; Selwyn, 2004) and that these inequalities reinforce each other in feedback loops (van Deursen et al., 2017). Differentiated Internet uses appear in a wide array of online activities, ranging from the types of content people seek and consume to material they produce and share (Blank, 2013; Hargittai & Walejko, 2008; Hargittai & Shaw, 2015; Schradie, 2011, 2015). Overcoming participation divides and knowledge gaps requires understanding mechanisms of stratification within these specific online activities more precisely. To fill this gap in the literature, we adopt a pipeline metaphor to describe the multiple stages of Internet uses and skills necessary to engage in more active forms of knowledge production online. The pipeline metaphor builds on research into gender-based inequalities in science, technology, engineering, and math fields (e.g., Blank, 2013; Correa, 2010; van Deursen et al., 2017; Hargittai & Walejko, 2008; National Academy of Sciences, National Academy of Engineering, & Institute of Medicine, 2007; Schradie, 2011, 2015). We use the pipeline to represent a sequence of stages through which an individual must pass in order to become more actively engaged in knowledge consumption and production activities through digital media. First, a user must have heard of a site to be able to contribute to it. Second, a user must have visited the site to participate on it. Third, a user must understand that it is possible to make contributions to the site in order to add content. Only once these conditions are met can a user participate on a site. This model describes a general sequence applicable to diverse contexts of online engagement, such as discussion threads on major news media sites like The Guardian or Slate, social media platforms like Facebook or Instagram, as well as sites dedicated to particular engagement activities like support groups or learning. Figure 1 visualizes the pipeline in the context of online participation. Figure 1 View largeDownload slide Schematic drawing of a pipeline of online participation. Figure 1 View largeDownload slide Schematic drawing of a pipeline of online participation. Research on online participation gaps tends to skip the first three stages and only collects data on who contributes online rather than establishing at which points people may have dropped out of the pool of potential contributors in the first place. That is, if a user does not know of a participatory site, she will not visit that site and will not interact on that site. Even if a user has heard of a participatory site, if she does not visit it, she will not participate on it. If a user visits a site, but does not know what types of behavior, i.e., active contributions, are possible on the site, then she will not engage in such behaviors. Conceptualizing online participation as a pipeline extends digital inequality literature to consider precursors to content creation and participation. Prior work that examines broad categories of online participatory behavior cannot determine at which point in the process of using the Internet people drop out of the potential pool of online contributors. Approaching online participation as a pipeline examines these stages in the process in their own right. We anticipate that distinct factors explain variations and inequalities at the different stages of this pipeline, and explain our expectations below after some context on participation gaps in Wikipedia. Metaphorically speaking, we anticipate evidence of “leaks” at different points along the pipeline of participation, explained by underlying knowledge gaps. Our data set and methods allow us to to test these claims in a way that prior data could not. Participation gaps in Wikipedia The setting for our study is Wikipedia, the most widely-visited participatory web site with unique potential to overcome knowledge gaps. Inequalities in contributions to Wikipedia are a significant issue, given that the site is one of the most utilized knowledge resources on the Internet (Reagle & Rhue, 2011; Wagner, Graells-Garrido, Garcia, & Menczer, 2016). Accordingly, who edits the site may contribute to content biases in the coverage of the encyclopedia and exacerbate knowledge gaps. Wikipedia thus provides an ideal context in which to understand how stratification processes in online participation relate to knowledge gaps within the networked public sphere (Benkler, 2006; Friedland et al., 2006). Gender gaps attract the bulk of scholarly and press attention about unequal contributions to Wikipedia. The population of U.S. adult contributors to the English-language edition of Wikipedia appears to be at least 75% male (Hill & Shaw, 2013). Explanations of the Wikipedia gender gap tend to focus on existing contributors to the site and the culture of the community. While differences in Internet skills have been shown to matter for how people participate online, little of the work that looks at who contributes to Wikipedia has considered skills (for an exception, see Hargittai & Shaw, 2015). Recent studies investigate two main aspects of digital inequality in Wikipedia. One group of projects looks at inequalities in content coverage, with an emphasis on gendered coverage gaps (e.g., Adams & Brückner, 2015; Graells-Garrido, Lalmas, & Menczer, 2015; Johnson et al., 2016; Wagner et al., 2016). Generally speaking, this line of work emphasizes unequal participation rates across sub-populations (e.g., male vs. female editors; rural vs. urban editors). The authors point to structures of privilege and bias in knowledge production (e.g., men are considered more notable in news media and scholarly sources overall) as the forces shaping unequal coverage patterns. A second group of studies focuses on inequalities in Wikipedia participation more directly, also with emphasis on gender divides (Bear & Collier, 2016; Ford & Wajcman, 2017; Hargittai & Shaw, 2015; Hill & Shaw, 2013; Shane-Simpson & Gillespie-Lynch, 2017). Some emphasize the culture of the existing Wikipedia community (e.g., Bear & Collier, 2016), where others emphasize skill-based barriers to entry (e.g., Hargittai & Shaw, 2015) as explanatory factors. On participation gaps by expanding prior models of how individuals become Wikipedia editors. We contribute a more precise conceptual model of a pipeline of participation. We also analyze a more diverse national dataset that includes many individuals who have never edited the site. Thinking of engagement with Wikipedia in terms of a pipeline goes beyond earlier research. Studies of Wikipedia editing tend to take for granted that people know about Wikipedia, visit Wikipedia, and understand that Wikipedia can be edited. For example, most research on the Wikipedia gender gap focuses almost exclusively on current contributors or social and technical infrastructures of the Wikipedia community in order to determine why people edit Wikipedia and why fewer women edit than men (e.g., Bear & Collier, 2016; Ford & Wajcman, 2017; Lam et al., 2011). Conceptually and empirically, this work overlooks the processes by which such stratification happens. Previous research also risks selecting on key dependent variables by only analyzing data on people who have already become Wikipedia readers and contributors. Without collecting similar data on non-readers and non-contributors, it is not possible to determine what factors influence becoming an editor (Hargittai & Shaw, 2015). What processes explain persistent participation gaps in content production on Wikipedia? We address this by testing the pipeline model described above. Our expectations specific to the pipeline for Wikipedia derive from earlier related findings. We anticipate that both individuals’ backgrounds as well as their Internet-specific experiences and skills shape whether they understand that Wikipedia can be edited and contribute to Wikipedia, and that gender and Internet skills have particularly strong relationships with these outcomes (Correa, 2010; Hargittai & Shaw, 2015; Schradie, 2015). We also expect that Internet experiences and background attributes explain who knows of the site and who has visited it, although we expect gender to play a less important role at these earlier points in the pipeline of digital participation given that the divide in general Internet use across men and women has all but disappeared over time (Martínez-Cantos, 2017; Ono & Zavodny, 2016). We test our model on a diverse sample of adult Internet users in the U.S.—including many who had never visited or edited Wikipedia—so that we could draw credible inferences about the factors associated with Wikipedia editing and how those relate to prior work on the knowledge gap hypothesis. Our analysis expands on prior studies both theoretically and empirically. Theoretically, we test the pipeline model of participation to see how having heard of, having visited, and knowing that Wikipedia can be edited might help explain who edits Wikipedia. Empirically, we contribute to the literature by examining this question on a national sample that is diverse in terms of age, socioeconomic status, and geography, factors that prior research has not considered in this domain. Data and methods To answer our research questions, we collected survey data from a national sample of U.S. adults (18 years old or over) in summer 2016. The survey instrument incorporated detailed measures of individuals’ background attributes and Internet experiences consistent with the theories and empirical findings of prior studies. Following the approach applied by Hargittai and Shaw (2015), we used logistic regression to model the likelihood that respondents had ever made an edit to Wikipedia. We also tested for differential outcomes at other points in the pipeline of participation by modeling who had heard of Wikipedia, who had visited Wikipedia, and who knew that Wikipedia can be edited. Details of our data collection and methods follow below. Data collection We contracted with an independent research organization, the NORC at the University of Chicago, to administer questions to their AmeriSpeak panel online. The panel is representative of the U.S. population using “area probability sampling and includes additional coverage of hard-to-survey population segments such as rural and low-income households that are underrepresented in surveys relying on address-based sampling” (NORC, n.d.). After pretesting the survey with 23 respondents and updating it based on the results in early May, 2016, we ran the survey from May 25 to July 5, 2016. We included an attention-check question and only analyzed responses from participants who passed this question. In total, we have valid responses from 1,512 American adults 18 and over, which constitutes a 37.8% survey response rate.1 Measures: independent variables Demographic and socioeconomic factors Background variables about respondents, such as their age, gender, education, income, and race/ethnicity, were supplied by the NORC based on their earlier data collection about the AmeriSpeak panel. Here we describe what coding we used for these measures. We report age as a continuous variable. We created three education categories: high school or less, some college, and college degree or more. Income was reported in 18 categories, which we recoded to their midpoint values to make it a continuous variable. In the regression analyses, we used the square root of income as this transformation produces a distribution much closer to normal. Race and ethnicity are dummy variables for White, Hispanic, African American, Asian American, Native American, and “Other.” We created a dichotomous “coupled” measure for those either married or living with a partner. We have a dummy variable for those employed either full time or part time. We also have a dummy variable signaling rural residence. Finally, we have a continuous measure of household size. Internet experiences and skills Following prior literature, we include measures for how long people have been Internet users, how much autonomy they have in accessing the Internet when and where they want, how much time they spend online, and their Internet skills. We asked respondents when they first started using the Internet by offering the following answer options with their recoded values in parentheses: “within the past year” (1), “1 to 5 years ago” (2.5), “more than 5, but less than 10 years ago” (7.5), and “10 or more years ago” (12.5). To measure autonomy of use, we asked “At which of these locations do you have access to the Internet, that is, if you wanted to you could use the Internet at which of these locations?” followed by nine options, including home, workplace, and friend’s home. To assess frequency of use, we asked “On an average weekday, not counting time spent on email, chat and phone calls, about how many hours do you spend visiting Web sites?” and then asked the same question about “average Saturday or Sunday.” The answer options ranged from “None” to “6 hours or more,” with six options in between. We calculated weekly hours spent on the Web by multiplying the answers to the first question by five and the second question by two, and summing the two figures. For measuring Internet skills, we used a validated, established index (Hargittai & Hsieh, 2012; Wasserman & Richmond-Abbott, 2005). Respondents were presented with 13 Internet-related terms (such as tagging, PDF, spyware) and were asked to rank their level of understanding of these items on a five-point scale ranging from “no understanding” to “full understanding.” We then calculated the mean for all items as the Internet skills measure (Cronbach’s alpha = .94). For our regression models we centered this measure around its mean so that all coefficients reflect relationships for an individual of average skills. Measures: dependent variables Early in the survey, we asked respondents: “Have you ever heard of the following sites and services?” with Wikipedia among the sites on the list. The vast majority (96.5%) reported having heard of Wikipedia. We then asked the people who had heard of the site: Have you ever visited the following sites and services? For each site, indicate if no, you have never visited it; yes, you have visited it in the past, but do not visit it nowadays; yes, you currently visit it sometimes; yes, you currently visit it often. Of those who had heard of Wikipedia, 82.5% said that they had ever visited the site (i.e., have done so in the past or do so currently). Later in the survey, we presented the statement “Wikipedia is a site that:” with multiple options from among which respondents were asked to “select all that apply.” Those who did not select the “can be edited by you” option were coded as not knowing that they could edit Wikipedia. We use these responses to construct a dichotomous measure for knowing that Wikipedia can be edited, for which 68.3% of the sample is coded as affirmative. To assess who had ever made contributions to Wikipedia, we asked those who reported ever having visited the site the following question: “Have you ever edited a Wikipedia page by fixing a mistake or adding new material?” with “no” and “yes” as the possible answers. A small minority, 8.2%, of the full sample responded in the affirmative. The sample Table 1 reports summary statistics for all of the measures. We have close to equal representation of women (51%) and men. The average age was 48.7 years. The majority were White (71%), followed by Hispanics (12%), African Americans (11%), Asian Americans (3%), Native Americans (2%), and people who reported “Other” races (1%). The median income was $55,000; the mean income was $71,478. Just over a quarter of the group (26%) had no more than a high school education, 32% had some college education, and 43% had a college degree or more. Less than two-thirds (61%) lived with a partner (married or not), 62% were employed either full time or part time, and 13% lived in a rural area. The median household size was two people; the mean was 2.6. In sum, while a diverse sample, it was more educated and had a higher income than the average American, as is the case with Internet users generally (Pew Research Center, 2017). Table 1 Descriptive Statistics for Independent Variables Percent Mean SD N Background Age (18–94) 48.74 16.87 1512 Income in U.S. $1,000s (2.5–225) 71.48 54.40 1512 Household size (1–6) 2.61 1.32 1512 Female 51 1512 Coupled 61 1512 Employed 62 1512 Rural resident 13 1512 Education High school or less 26 1512 Some college 32 1512 Bachelor’s or higher 43 1512 Race & Ethnicity White 71 1511 Hispanic 12 1511 Black 11 1511 Asian 3 1511 Native American 2 1511 Other 1 1511 Internet Experiences Internet autonomy (0–9) 4.80 2.28 1512 Internet use frequency (0–42) 14.75 10.75 1491 Years of Internet use (1–12.5) 11.11 2.78 1512 Internet skills (1–5) 3.37 1.08 1511 Percent Mean SD N Background Age (18–94) 48.74 16.87 1512 Income in U.S. $1,000s (2.5–225) 71.48 54.40 1512 Household size (1–6) 2.61 1.32 1512 Female 51 1512 Coupled 61 1512 Employed 62 1512 Rural resident 13 1512 Education High school or less 26 1512 Some college 32 1512 Bachelor’s or higher 43 1512 Race & Ethnicity White 71 1511 Hispanic 12 1511 Black 11 1511 Asian 3 1511 Native American 2 1511 Other 1 1511 Internet Experiences Internet autonomy (0–9) 4.80 2.28 1512 Internet use frequency (0–42) 14.75 10.75 1491 Years of Internet use (1–12.5) 11.11 2.78 1512 Internet skills (1–5) 3.37 1.08 1511 Note: For interval measures the range (min.—max.) appears in parentheses. View Large Regarding online experiences, very few of our respondents were new to the Internet: this is not surprising given that Internet use statistics have plateaued in the United States in recent years (Pew Research Center, 2017). The average participant had been using the Internet for over ten years. The median number of access locations was five; the mean was 4.8. The median number of hours participants spent online weekly was 12 hours; the mean was 14.8. Our skill measure ranged from 1–5 and the mean was 3.4 (standard deviation: 1.1), showing that respondents varied considerably in their online know-how. Analyses First, we present bivariate statistics to show how background characteristics and online skills relate to Wikipedia editing. Then we show results from logistic regression analyses to establish what factors explain each dependent variable when accounting for the independent variables together. To clarify and aid interpretation, we also present marginal effects plots visualizing key relationships in our data on the basis of prototypical values that we use to generate model-based predictions (King, Tomz, & Wittenberg, 2000). For all models, we present findings for the full sample. That is, we include respondents who have never heard of Wikipedia and those who have never visited the site. Such people, by definition, have never made edits to the site and are centrally relevant to questions of inequality when it comes to understanding whose voices are represented on Wikipedia. Consistent with prior work, we also explored alternative model specifications incorporating multiplicative interaction terms (Hargittai & Shaw, 2015). With one exception, these specifications did not improve the overall model fit, based on the results of likelihood ratio tests. The only exception was an interaction between higher education and Internet skills in explaining who has edited Wikipedia. As a result, we report the corresponding model with this term included and do not include interaction terms for any of the other models. Results Descriptive analysis: participation gaps Table 2 summarizes the four dependent variables in our analysis and supports the intuition behind the pipeline model in the context of Wikipedia editing. We see a steep dropoff in each successive outcome. At the extremes, nearly all (97%) of our respondents have heard of Wikipedia, whereas only 8% have ever edited it. To the best of our knowledge, this is the first such estimate ever collected in a national survey of U.S. adults. The figure on Wikipedia readership is consistent with other estimates (Hill & Shaw, 2013). Table 2 Descriptive Statistics for Dependent Variables Percent N Has heard of Wikipedia 97 1503 Has visited Wikipedia 83 1486 Knows Wikipedia is editable 68 1512 Has edited Wikipedia 8 1483 Percent N Has heard of Wikipedia 97 1503 Has visited Wikipedia 83 1486 Knows Wikipedia is editable 68 1512 Has edited Wikipedia 8 1483 Note: All are dichotomous measures and percentages are calculated as a proportion of the number (N) of valid responses to each question. View Large Table 3 presents bivariate analyses of background characteristics and Internet experiences with who has edited Wikipedia. Here, we report relationships that are significant at p < 0.001 (the table presents all relationships). Among women, 5.7% reported having ever changed something on a Wikipedia page compared to 10.9% of men. Only 4.1% of those in the highest quartile of age (62 and over) had edited the online encyclopedia, compared to 16.3% of those in the lowest quartile of age (34 and under). Among those with no more than a high school education, 3.7% had edited Wikipedia, compared to 11.8% of those with a college degree or more. Table 3 Wikipedia Contribution by Respondent Attribute and Prior Internet Experiences Percentage χ2 statistic Background Age LQ 16.25 39.62*** Age HQ 4.09 11.29*** Income in U.S. $1,000s LQ 9.41 .48 Income in U.S. $1,000s HQ 7.96 .03 Household size LQ 8.96 .14 Household size HQ 8.60 .12 Female 5.69 12.49*** Male 10.87 12.49*** Coupled 7.75 .54 Uncoupled 8.97 .54 Employed 9.30 3.37* Unemployed 6.45 3.37* Rural resident 7.96 .00 Urban resident 8.27 .00 Education High school or less 3.65 13.59*** Some college 7.10 .94 Bachelor’s or higher 11.83 18.22*** Race & Ethnicity White 8.58 .44 Hispanic 7.60 .03 Black 7.83 .00 Asian 4.26 .55 Native American 8.33 .00 Other 7.69 .00 Internet experiences Internet autonomy LQ 3.72 8.13*** Internet autonomy HQ 14.18 23.57*** Internet use frequency LQ 3.02 14.68*** Internet use frequency HQ 11.89 8.40** Years of Internet use LQ 3.33 12.64*** Years of Internet use HQ 9.63 12.64*** Internet skills LQ .81 34.76*** Internet skills HQ 18.51 72.01*** Percentage χ2 statistic Background Age LQ 16.25 39.62*** Age HQ 4.09 11.29*** Income in U.S. $1,000s LQ 9.41 .48 Income in U.S. $1,000s HQ 7.96 .03 Household size LQ 8.96 .14 Household size HQ 8.60 .12 Female 5.69 12.49*** Male 10.87 12.49*** Coupled 7.75 .54 Uncoupled 8.97 .54 Employed 9.30 3.37* Unemployed 6.45 3.37* Rural resident 7.96 .00 Urban resident 8.27 .00 Education High school or less 3.65 13.59*** Some college 7.10 .94 Bachelor’s or higher 11.83 18.22*** Race & Ethnicity White 8.58 .44 Hispanic 7.60 .03 Black 7.83 .00 Asian 4.26 .55 Native American 8.33 .00 Other 7.69 .00 Internet experiences Internet autonomy LQ 3.72 8.13*** Internet autonomy HQ 14.18 23.57*** Internet use frequency LQ 3.02 14.68*** Internet use frequency HQ 11.89 8.40** Years of Internet use LQ 3.33 12.64*** Years of Internet use HQ 9.63 12.64*** Internet skills LQ .81 34.76*** Internet skills HQ 18.51 72.01*** Note: For continuous measures, we report the prevalence of the outcome within the lowest quartile (LQ) and the highest quartile (HQ) of the distribution. *p < .05, **p < .01, ***p < .001 View Large Regarding the relationship with Internet experiences, 3.7% of those in the bottom quartile of number of access locations had edited the site, compared to 14.2% of those in the top quartile. Three percent of the least frequent Internet users report editing, compared to 11.9% of the most frequent users. The skill differences are especially notable. Among the least skilled quartile, less than one percent had edited Wikipedia, while among the most skilled quartile 18.5% had. Modeling the pipeline of Wikipedia contribution Table 4 provides a correlation matrix for all the measures in our study. While all correlations greater than zero meet conventional thresholds for statistical significance, none (except for some between-categories within the same measure) are greater than 0.4, and thus we have no multicollinearity concerns. Table 5 reports the results of our regression analyses. For each dependent variable, we estimate two logit models, regressing the same dichotomous outcome (e.g., whether a respondent has ever edited Wikipedia) on the block of background measures in the first model and then including the Internet experience and skill measures in the second model. We report coefficients as raw log-odds and include the standard error in parentheses. Asterisks indicate which coefficients meet conventional thresholds for statistical significance against a null-hypothesis of no association with the outcome. Table 4 Matrix of Pearson Correlation Coefficients for All Measures 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 1. Age 2. Income in U.S. $1,000s .1 3. Household size −.3 .1 4. Female −.1 −.1 .1 5. Coupled .1 .3 .3 −.1 6. Employed −.4 .1 .1 0 0 7. Rural resident 0 −.1 0 0 .1 −.1 8. High school or less −.1 −.2 .1 .1 0 −.1 .1 9. Some college 0 −.1 0 0 −.1 0 0 −.4 10. Bachelor’s or higher .1 .3 −.1 −.1 .1 .1 −.1 −.5 −.6 11. White .2 .1 −.2 −.1 .1 −.1 .1 0 −.1 .1 12. Hispanic −.2 −.1 .2 .1 0 .1 −.1 0 .1 −.1 −.6 13. Black −.1 −.1 0 .1 −.2 0 −.1 0 0 0 −.6 −.1 14. Asian −.1 .1 0 0 0 .1 0 −.1 0 .1 −.3 −.1 −.1 15. Native American 0 0 0 0 0 0 0 0 .1 −.1 −.2 0 0 0 16. Other 0 0 0 0 0 0 0 0 0 0 −.2 0 0 0 0 17. Internet autonomy −.3 .1 .1 0 0 .3 −.1 −.2 0 .2 0 0 0 0 0 0 18. Internet use frequency −.3 −.1 .1 0 −.1 0 0 0 .1 −.1 −.2 .1 .1 0 0 0 .1 19. Years of Internet use 0 .1 −.1 0 0 .1 0 −.3 0 .2 .1 −.1 0 .1 −.1 0 .1 .1 20. Internet skills −.4 .1 .1 −.1 0 .3 −.1 −.2 0 .2 0 0 0 0 0 0 .4 .2 .3 21. Has heard of Wikipedia −.1 .1 0 0 0 .1 0 −.1 0 .1 0 0 −.1 0 0 0 .1 0 .1 .2 22. Has visited Wikipedia −.2 .2 0 0 0 .2 −.1 −.2 0 .2 0 0 0 .1 0 0 .3 .1 .3 .4 .4 23. Knows Wikipedia is editable −.3 .1 .1 −.1 0 .2 0 −.2 0 .2 .1 0 −.1 .1 0 −.1 .3 .1 .2 .4 .2 .4 24. Has edited Wikipedia −.2 0 0 −.1 0 .1 0 −.1 0 .1 0 0 0 0 0 0 .1 .1 .1 .2 .1 .1 .2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 1. Age 2. Income in U.S. $1,000s .1 3. Household size −.3 .1 4. Female −.1 −.1 .1 5. Coupled .1 .3 .3 −.1 6. Employed −.4 .1 .1 0 0 7. Rural resident 0 −.1 0 0 .1 −.1 8. High school or less −.1 −.2 .1 .1 0 −.1 .1 9. Some college 0 −.1 0 0 −.1 0 0 −.4 10. Bachelor’s or higher .1 .3 −.1 −.1 .1 .1 −.1 −.5 −.6 11. White .2 .1 −.2 −.1 .1 −.1 .1 0 −.1 .1 12. Hispanic −.2 −.1 .2 .1 0 .1 −.1 0 .1 −.1 −.6 13. Black −.1 −.1 0 .1 −.2 0 −.1 0 0 0 −.6 −.1 14. Asian −.1 .1 0 0 0 .1 0 −.1 0 .1 −.3 −.1 −.1 15. Native American 0 0 0 0 0 0 0 0 .1 −.1 −.2 0 0 0 16. Other 0 0 0 0 0 0 0 0 0 0 −.2 0 0 0 0 17. Internet autonomy −.3 .1 .1 0 0 .3 −.1 −.2 0 .2 0 0 0 0 0 0 18. Internet use frequency −.3 −.1 .1 0 −.1 0 0 0 .1 −.1 −.2 .1 .1 0 0 0 .1 19. Years of Internet use 0 .1 −.1 0 0 .1 0 −.3 0 .2 .1 −.1 0 .1 −.1 0 .1 .1 20. Internet skills −.4 .1 .1 −.1 0 .3 −.1 −.2 0 .2 0 0 0 0 0 0 .4 .2 .3 21. Has heard of Wikipedia −.1 .1 0 0 0 .1 0 −.1 0 .1 0 0 −.1 0 0 0 .1 0 .1 .2 22. Has visited Wikipedia −.2 .2 0 0 0 .2 −.1 −.2 0 .2 0 0 0 .1 0 0 .3 .1 .3 .4 .4 23. Knows Wikipedia is editable −.3 .1 .1 −.1 0 .2 0 −.2 0 .2 .1 0 −.1 .1 0 −.1 .3 .1 .2 .4 .2 .4 24. Has edited Wikipedia −.2 0 0 −.1 0 .1 0 −.1 0 .1 0 0 0 0 0 0 .1 .1 .1 .2 .1 .1 .2 Note: Non-zero values are statistically significant at p ≤ .05 level. View Large Table 5 Regression Models 1−8 Has heard of WP Has visited WP Knows WP editable Has edited WP (1) (2) (3) (4) (5) (6) (7) (8) (Intercept) 3.91*** 2.27* 1.92*** 0.11 2.65*** 1.47** −0.30 −2.51** (0.93) (1.14) (0.46) (0.63) (0.40) (0.52) (0.54) (0.91) Background Age −0.04** −0.02 −0.04*** −0.01 −0.06*** −0.04*** −0.05*** −0.03*** (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) Income in U.S. $1,000s 0.15* 0.13* 0.10*** 0.07* 0.04 0.01 −0.02 −0.05 (0.06) (0.07) (0.03) (0.03) (0.02) (0.02) (0.04) (0.04) Household size −0.08 −0.10 −0.04 −0.04 0.03 0.03 −0.08 −0.12 (0.13) (0.13) (0.07) (0.08) (0.06) (0.06) (0.09) (0.09) Female −0.33 −0.34 0.17 0.24 −0.32* −0.27* −0.76*** −0.55** (0.31) (0.32) (0.15) (0.17) (0.13) (0.13) (0.21) (0.21) Coupled −0.19 −0.03 −0.19 −0.10 −0.09 −0.03 0.03 0.17 (0.35) (0.36) (0.18) (0.20) (0.15) (0.16) (0.23) (0.24) Employed 0.87* 0.74* 0.60*** 0.44* 0.11 −0.05 −0.27 −0.38 (0.36) (0.37) (0.17) (0.18) (0.14) (0.15) (0.24) (0.25) Rural resident −0.32 −0.36 −0.11 0.01 0.17 0.28 0.22 0.38 (0.37) (0.38) (0.20) (0.22) (0.18) (0.19) (0.30) (0.31) Education (base = HS or less) Some college 0.74* 0.25 0.90*** 0.33 0.84*** 0.52** 0.70* 0.25 (0.33) (0.35) (0.18) (0.20) (0.16) (0.17) (0.34) (0.36) Bachelor’s or higher 2.08*** 1.48** 1.82*** 1.33*** 1.75*** 1.37*** 1.51*** 1.46*** (0.51) (0.53) (0.21) (0.23) (0.17) (0.18) (0.33) (0.41) Race & Ethnicity (base = White) Hispanic −0.11 0.26 −0.21 0.27 −0.66** −0.43* −0.37 −0.17 (0.58) (0.60) (0.26) (0.29) (0.21) (0.22) (0.33) (0.35) Black −0.83* −0.71 −0.38 −0.26 −1.07* −1.03*** −0.22 −0.20 (0.39) (0.41) (0.24) (0.26) (0.20) (0.21) (0.33) (0.34) Asian 0.48 0.54 −0.01 −0.03 −1.23 −1.06 (0.66) (0.70) (0.44) (0.45) (0.74) (0.75) Native American −0.25 −0.05 −0.83 −0.59 −0.11 0.13 0.29 0.57 (1.08) (1.10) (0.51) (0.54) (0.48) (0.49) (0.79) (0.86) Other 0.68 1.14 −1.19 −1.38* 0.09 −0.13 (0.83) (0.90) (0.64) (0.70) (1.08) (1.10) Internet Experiences Internet autonomy 0.09 0.10* 0.10** 0.08 (0.08) (0.04) (0.03) (0.05) Internet use frequency 0.01 0.00 −0.00 0.02 (0.02) (0.01) (0.01) (0.01) Years of Internet use 0.07 0.07* 0.05* 0.04 (0.04) (0.03) (0.02) (0.06) Internet skills 0.58** 0.86* 0.51*** 1.22*** (0.18) (0.10) (0.07) (0.24) Edu. BA or higher × Internet skills −0.58* (0.29) AIC 398.20 380.37 1169.72 1010.52 1574.07 1471.32 777.06 715.11 BIC 467.29 470.49 1249.27 1111.03 1653.88 1572.14 856.58 820.86 Log Likelihood −186.10 −173.19 −569.86 −486.26 −772.04 −716.66 −373.53 −337.56 Deviance 372.20 346.37 1139.72 972.52 1544.07 1433.32 747.06 675.11 Num. obs. 1502 1482 1485 1465 1511 1490 1482 1462 Has heard of WP Has visited WP Knows WP editable Has edited WP (1) (2) (3) (4) (5) (6) (7) (8) (Intercept) 3.91*** 2.27* 1.92*** 0.11 2.65*** 1.47** −0.30 −2.51** (0.93) (1.14) (0.46) (0.63) (0.40) (0.52) (0.54) (0.91) Background Age −0.04** −0.02 −0.04*** −0.01 −0.06*** −0.04*** −0.05*** −0.03*** (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) Income in U.S. $1,000s 0.15* 0.13* 0.10*** 0.07* 0.04 0.01 −0.02 −0.05 (0.06) (0.07) (0.03) (0.03) (0.02) (0.02) (0.04) (0.04) Household size −0.08 −0.10 −0.04 −0.04 0.03 0.03 −0.08 −0.12 (0.13) (0.13) (0.07) (0.08) (0.06) (0.06) (0.09) (0.09) Female −0.33 −0.34 0.17 0.24 −0.32* −0.27* −0.76*** −0.55** (0.31) (0.32) (0.15) (0.17) (0.13) (0.13) (0.21) (0.21) Coupled −0.19 −0.03 −0.19 −0.10 −0.09 −0.03 0.03 0.17 (0.35) (0.36) (0.18) (0.20) (0.15) (0.16) (0.23) (0.24) Employed 0.87* 0.74* 0.60*** 0.44* 0.11 −0.05 −0.27 −0.38 (0.36) (0.37) (0.17) (0.18) (0.14) (0.15) (0.24) (0.25) Rural resident −0.32 −0.36 −0.11 0.01 0.17 0.28 0.22 0.38 (0.37) (0.38) (0.20) (0.22) (0.18) (0.19) (0.30) (0.31) Education (base = HS or less) Some college 0.74* 0.25 0.90*** 0.33 0.84*** 0.52** 0.70* 0.25 (0.33) (0.35) (0.18) (0.20) (0.16) (0.17) (0.34) (0.36) Bachelor’s or higher 2.08*** 1.48** 1.82*** 1.33*** 1.75*** 1.37*** 1.51*** 1.46*** (0.51) (0.53) (0.21) (0.23) (0.17) (0.18) (0.33) (0.41) Race & Ethnicity (base = White) Hispanic −0.11 0.26 −0.21 0.27 −0.66** −0.43* −0.37 −0.17 (0.58) (0.60) (0.26) (0.29) (0.21) (0.22) (0.33) (0.35) Black −0.83* −0.71 −0.38 −0.26 −1.07* −1.03*** −0.22 −0.20 (0.39) (0.41) (0.24) (0.26) (0.20) (0.21) (0.33) (0.34) Asian 0.48 0.54 −0.01 −0.03 −1.23 −1.06 (0.66) (0.70) (0.44) (0.45) (0.74) (0.75) Native American −0.25 −0.05 −0.83 −0.59 −0.11 0.13 0.29 0.57 (1.08) (1.10) (0.51) (0.54) (0.48) (0.49) (0.79) (0.86) Other 0.68 1.14 −1.19 −1.38* 0.09 −0.13 (0.83) (0.90) (0.64) (0.70) (1.08) (1.10) Internet Experiences Internet autonomy 0.09 0.10* 0.10** 0.08 (0.08) (0.04) (0.03) (0.05) Internet use frequency 0.01 0.00 −0.00 0.02 (0.02) (0.01) (0.01) (0.01) Years of Internet use 0.07 0.07* 0.05* 0.04 (0.04) (0.03) (0.02) (0.06) Internet skills 0.58** 0.86* 0.51*** 1.22*** (0.18) (0.10) (0.07) (0.24) Edu. BA or higher × Internet skills −0.58* (0.29) AIC 398.20 380.37 1169.72 1010.52 1574.07 1471.32 777.06 715.11 BIC 467.29 470.49 1249.27 1111.03 1653.88 1572.14 856.58 820.86 Log Likelihood −186.10 −173.19 −569.86 −486.26 −772.04 −716.66 −373.53 −337.56 Deviance 372.20 346.37 1139.72 972.52 1544.07 1433.32 747.06 675.11 Num. obs. 1502 1482 1485 1465 1511 1490 1482 1462 Note: Coefficients are log-odds with standard errors in parentheses. *p < .05, **p < .01, ***p < .001 View Large Models 1 and 2 present the factors that explain whether respondents have heard of Wikipedia. Income, employment status, education at the bachelor’s degree level or higher, and Internet skills predict this outcome. Note that these models do not include measures for the Asian or “Other” race categories due to sparse variation in the dependent variable within these categories (nearly all respondents in the two categories had heard of Wikipedia). Models 3 and 4 regress the outcome “has visited Wikipedia” on the same blocks of predictors. Again, income, higher education, and Internet skills emerge as salient predictors. Internet autonomy (number of access points) and years of use also help explain this outcome. In Models 5 and 6, we consider what explains whether individuals know that Wikipedia can be edited. Here age, gender, and several racial/ethnic identity categories (Black, Hispanic, Other) emerge as salient explanatory factors where they did not before. Income no longer explains the outcome. Education level associates strongly with knowing Wikipedia can be edited. Measures of Internet autonomy, years of use, and skills also all remain significant. Finally, Models 7 and 8 regress “has edited Wikipedia” on the same blocks of predictors. The results in Model 7 reveal strong associations between editing Wikipedia and respondents’ genders, ages, and education levels. Consistent with the bivariate statistics described earlier, being female associates strongly with never editing Wikipedia, even when controlling for other background factors. In addition, younger respondents are more likely to have edited, as are those who have an education level of some college or a bachelor’s degree or higher, as compared to those with only a high school degree or less. Incorporating Internet experience and skill measures in Model 8 alters the picture in several ways. A likelihood ratio test indicates that the overall model fit improves (test statistic = 61.9, 4 df, p = 0). All of the associations revealed in Model 7 persist, but are attenuated to some degree. Notably, we also observe that greater Internet use frequency and Internet use skills positively associate with ever editing Wikipedia. To further illustrate the results from our regression models, we visualize predicted outcomes given prototypical values of the independent variables. We follow the approach developed by King et al. (2000) and used by Hargittai and Shaw (2015) on similar data. As noted earlier, considerable prior work has focused on gender differences in Wikipedia editing. Thus, we present a visualization of our results that maps the relationships between each outcome, gender, and education and Internet skills (our two strongest predictors). This helps put our visualizations in perspective in relation to earlier work (e.g., Hargittai & Shaw, 2015). Figure 2 shows the predicted probabilities for each of our four dependent variables when we allow gender, education level, and Internet skills to vary using the empirically observed values from the data and hold other measures at typical values. In the top-left plot, we observe only small variations in the predicted probability of having heard of Wikipedia across the distribution of Internet skills and education levels. Differences primarily emerge at the very low end of the Internet skills spectrum and among those who have completed less than a bachelor’s degree. The plot on the top-right shows the predicted probability of having visited Wikipedia and reveals how strongly this outcome relates to Internet skills and education level. The gap between the gray (BA or higher) and black (less than a BA) lines gets widest at the low end of the Internet skills spectrum. In contrast, at the high end of the skills distribution, educational divides in who visits Wikipedia nearly disappear. Figure 2 View largeDownload slide Marginal effects plots. Note: Predicted probabilities of having heard of Wikipedia (top left); having visited Wikipedia (top right); knowing that Wikipedia can be edited (bottom left); and having contributed to Wikipedia (bottom right). Figure 2 View largeDownload slide Marginal effects plots. Note: Predicted probabilities of having heard of Wikipedia (top left); having visited Wikipedia (top right); knowing that Wikipedia can be edited (bottom left); and having contributed to Wikipedia (bottom right). The second row of plots in Figure 2 show the predicted probabilities for the less prevalent outcomes: knowing that Wikipedia can be edited (bottom left) and having contributed to Wikipedia (bottom right). Across all three of our key predictors (gender, education level, and Internet skills), we observe substantial differences in who knows that Wikipedia can be edited. At the extremes, an immense gap separates the predicted probability that a low-skill, low-educated female and a high-skill, high-educated male know that Wikipedia can be edited (about 28% versus 94%, respectively). Such vast inequalities underscore that this step in the pipeline of participation accounts for a huge decrease in the population of potential Wikipedia editors. Finally, we visualize variations in the probability of editing Wikipedia in the bottom right plot of Figure 2. In comparison to the other three plots, we note the low probability of this outcome across the board. The predicted probability of editing Wikipedia remains very close to zero for anyone at the lower end of the Internet use skills distribution. At the upper end of the skills distribution, the predicted values again diverge along all three of the predictors. In other words, a gender gap in Wikipedia contribution does not emerge among low or average values of Internet skills. Instead, the gap exists almost exclusively among the most highly-skilled and highly-educated Internet users, all of whom remain much more likely to edit Wikipedia than individuals with lower levels of Internet skills or education. The model predicts that the lower-educated, highly-skilled male respondents are just as likely to edit Wikipedia as the highly-educated, highly-skilled female respondents. Discussion The results provide empirical support for the idea that knowledge gaps contribute to a pipeline of online participation in the networked public sphere. Individuals do not automatically engage in content production on even the most participatory sites, but must have the necessary knowledge to do so. Participation divides emerge through a sequence of necessary awareness and behaviors. Education, Internet skills, and age have robust associations with outcomes at every step in the pipeline. Other factors, such as income, employment status, and racial/ethnic background, help explain earlier stages in the pipeline even though they do not associate with who contributes content. Gender only matters at later stages in the pipeline, despite the important and valid emphasis of prior research and public debate on the Wikipedia gender gap. Distinct from prior studies, we provide evidence that participation divides in who creates content online are likely due to variations in who has visited a site and has the requisite knowledge that such contributions are possible (see Figure 2). This contributes a clearer understanding of stratification processes in knowledge production in the networked public sphere. These findings make several other contributions to prior research. First, bivariate comparisons in our data reveal participation gaps across many dimensions of background and Internet experiences that had not been documented previously. In multivariate regression models, age, education level, and Internet skills are the only measures that consistently explain outcomes throughout the pipeline of participation. Individuals who are younger, more highly educated, and more skilled Internet users are more likely to have ever heard of, visited, known about editing, and edited Wikipedia. Other background attributes and Internet experiences explain variations at different stages in the pipeline and to different degrees. Some characteristics (income, employment status), as well as Internet experiences and autonomy, help explain variations in having heard of Wikipedia or visiting Wikipedia. Other attributes (racial/ethnic background), as well as Internet experiences and autonomy, help explain intermediate stages (visiting Wikipedia and/or knowing Wikipedia can be edited). A gender gap emerges only at the latter stages in the pipeline, in terms of who knows the site can be edited and who has edited the site. The results appear consistent with our expectations, but several patterns surprised us. Participation divides in the middle stages of the pipeline are vast compared to those at the pipeline’s endpoints. In particular, we note that large differences emerge in our models of who has visited Wikipedia and who knows Wikipedia can be edited (see Figure 2). These gaps along dimensions of education, Internet skills, and (in the case of knowing Wikipedia can be edited) gender all dwarf inequalities in the predicted probability of ever having edited Wikipedia. Prior research has neither identified nor considered these gaps. Our findings suggest that the intermediate stages matter a great deal and should attract much closer scrutiny in the future. We speculate that reducing gaps at these intermediate stages of the participation pipeline would make more equitable participation patterns possible. Future research should investigate the effects of interventions to test this claim. Limitations of this study include standard concerns about the validity of self-reported behavioral data, as well as unknown biases in our estimation related to the sampling techniques, measurement errors, or omitted variables. For example, we note that the study includes no measures of underlying interests or attitudes, which play a role in other kinds of online behavior (Brake, 2014). Furthermore, the study only includes U.S. adult Internet users, and the results may diverge from patterns in other populations. Engagement patterns with Wikipedia may also differ from engagement with other sites. We have no way of empirically testing or bounding the impact of these issues, but encourage readers to approach our results with appropriate care. Wherever possible, we have documented alternative estimation procedures and decisions in this paper, none of which have eliminated the patterns of results we report. We hope to see others pursue follow-up studies to evaluate any concerns empirically. We also hope that future studies will evaluate similar questions using panel data in order to assess whether the patterns we report here change over time. Future work may also elaborate alternative conceptual frameworks to emphasize different dimensions of the selection and stratification processes involved. For all of these findings, we draw confidence from the fact that our analyses involved an extremely diverse national sample that includes respondents who had never heard of nor visited Wikipedia, one of the most popular participatory sites on the Internet. We also incorporated an additional measure of rural/urban location into our analysis based on prior work (Johnson et al., 2016) that earlier participation gap research had not used, but found no evidence of an association with who edits the online encyclopedia. Conclusion We observed a sequence of awareness and actions necessary to engage in online participation and show how knowledge gaps manifest at every step. These participation divides may contribute to the expansion of knowledge gaps along the lines proposed by the original knowledge gap hypothesis (Tichenor et al., 1970). Online platforms may offer more egalitarian and inclusive practices of knowledge production than prior media ecosystems (Benkler, 2006; Reagle & Rhue, 2011). At the same time, any biases encoded into Wikipedia may also propagate widely, as intelligent and algorithmic systems incorporate user-generated content such as Wikipedia’s automatically into search results and other basic infrastructures of the networked public sphere (e.g., Johnson et al., 2016). The proliferation of downstream systems using content generated by participatory sites means that digital inequalities and skills gaps may have far-reaching and unexpected consequences. These aspects of knowledge gaps in the networked public sphere also deserve deeper investigation. Our results illuminate important dimensions of participation gaps that prior research and policy debates have either not emphasized or not addressed at all. The findings about education levels and Internet skills deserve particular emphasis. Research on Wikipedia participation has neither identified nor estimated an education divide, a concept particularly relevant to the digital inequality literature. In terms of the skills divide, our findings are consistent with Hargittai & Shaw’s (2015), and we show further evidence that Wikipedia contribution remains vanishingly rare at the low end of the Internet skills spectrum. Transforming the culture of participation among existing Wikipedians—an area of intervention that receives considerable attention—will not overcome participation gaps. The fact that less educated, lower skilled, and older individuals are less likely to engage at every point in the participation pipeline suggests that interventions aimed at addressing these factors will be necessary to produce a more equitable and representative online encyclopedia. We believe these findings lend support to interventions that reduce technical and knowledge-based barriers to entry. We also suggest interventions aimed at including lower income, unemployed, and underrepresented racial and ethnic groups at earlier stages in the participation pipeline where gaps in Internet experience and autonomy may present distinct obstacles to subsequent engagement. When it comes to overcoming gender gaps in the case of Wikipedia, our results suggest that continued emphasis on recruiting female editors must include efforts to disseminate the knowledge that Wikipedia can be edited. Future work should pursue causal identification and longitudinal tests of the patterns of results we have reported here. Evaluations of interventions aimed at redressing participation divides and pipelines of contribution in Wikipedia and beyond also offer a promising avenue of inquiry. Studies that focus on pathways of participation (e.g., Antin, Cheshire, & Nov, 2012) should seek to understand the role of factors such as Internet skills and education. We also encourage future studies to analyze stratification processes and pipelines of participation in online knowledge-production activities beyond Wikipedia. Such comparative analyses may reveal pathways towards greater equity and engagement among different demographic groups or people at different levels of Internet experiences and skills. Our focus on a single, highly-impactful site like Wikipedia has allowed us to elaborate an initial statement and evaluation of the pipeline framework. This extends prior work on digital inequality in the context of knowledge gaps. Better understanding these phenomena can inform design research and policy interventions aimed at realizing a more equitable and inclusive digital media environment. Authors note The order of author names does not reflect differences in contribution to the research. Aaron Shaw is Assistant Professor of Communication Studies at Northwestern University and a 2017–2018 Lenore Annenberg and Wallis Annenberg Fellow in Communication at the Center for Advanced Study in the Behavioral Sciences at Stanford University. Eszter Hargittai is Professor and Chair of Internet Use and Society at the Institute of Communication and Media Research at the University of Zurich. Shaw is Faculty Associate and Hargittai is on the Fellow Advisory Board of the Berkman Klein Center for Internet and Society at Harvard University. Acknowledgements The authors are grateful to Merck (Merck is known as MSD outside the United States and Canada) and the Robert and Kaye Hiatt Fund at Northwestern University for support. Hargittai also appreciates the time made available through the April McClain-Delaney and John Delaney Professorship at Northwestern University to conduct this work. The authors appreciate input from the 2015–2016 cohort of research assistants of the Web Use Project at Northwestern University, especially Sam Mandlsohn, and acknowledge staff at the NORC at the University of Chicago for administering the survey discussed here. Audiences in the Research Seminar of the Institute for Communication and Media Research at the University of Zurich, the Bring Your Own Research Workshop at Northwestern, the Department of Media and Communications at the London School of Economics and Political Science, the Annual Conference of the International Communication Association, and the Annual Meeting of the American Sociological Association, as well as Darren Gergle, Ellen Helsper, and the editor and reviewers of the Journal of Communication, provided valuable feedback and direction on earlier versions of this work. Note 1 We note that the NORC provides ranked survey weights for data from the AmeriSpeak panel which, in theory, enable an approximation of a representative sample of U.S. adults. We do not use the weights in our analyses here. Our rationale stems from standard concerns about sparsity and survey weighting, as well as empirical testing of our models with and without the weights (Bollen, Biemer, Karr, Tueller, & Berzofsky, 2016). Given that we are modeling several rare outcomes, sparsity poses a real threat. Following three widely used and analytically distinct procedures discussed in Bollen and colleagues’ review, we find (1) weak evidence that the model residuals are correlated with the weights, (2) no evidence that adding the weights improves overall model fit, and (3) some evidence of meaningful overall differences in the coefficients with and without weights applied (see Bollen et al., 2016 for details of testing procedures). The overall pattern of findings does not change in weighted versus unweighted models. References Adams, J., & Brückner, H. ( 2015). Wikipedia, sociology, and the promise and pitfalls of Big Data. Big Data & Society , 2( 2), 1– 5. doi:10.1177/2053951715614332 Google Scholar CrossRef Search ADS Antin, J., Cheshire, C., & Nov, O. ( 2012). Technology-mediated contributions: editing behaviors among new wikipedians. In Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work (pp. 373–382). New York, NY: ACM. doi:10.1145/2145204.2145264 Bear, J. B., & Collier, B. ( 2016). Where are the women in Wikipedia? Understanding the different psychological experiences of men and women in Wikipedia. Sex Roles , 74( 5), 254– 265. doi:10.1007/s11199-015-0573-y Google Scholar CrossRef Search ADS Benkler, Y. ( 2006). The wealth of networks: How social production transforms markets and freedom . New Haven, CT: Yale University Press. Bollen, K. A., Biemer, P. P., Karr, A. F., Tueller, S., & Berzofsky, M. E. ( 2016). Are survey weights needed? A review of diagnostic tests in regression analysis. Annual Review of Statistics and Its Application , 3( 1), 375– 392. doi:10.1146/annurev-statistics-011516-012958 Google Scholar CrossRef Search ADS Bonfadelli, H. ( 2002). The Internet and knowledge gaps: A theoretical and empirical investigation. European Journal of Communication , 17( 1), 65– 84. doi:10.1177/0267323102017001607 Google Scholar CrossRef Search ADS Brake, D. R. ( 2014). Are we all online content creators now? Web 2.0 and digital divides. Journal of Computer-Mediated Communication , 19( 3), 591– 609. doi:10.1111/jcc4.12042 Google Scholar CrossRef Search ADS Cacciatore, M. A., Scheufele, D. A., & Corley, E. A. ( 2014). Another (methodological) look at knowledge gaps and the Internet’s potential for closing them. Public Understanding of Science , 23( 4), 376– 394. doi:10.1177/0963662512447606 Google Scholar CrossRef Search ADS PubMed Chen, W., & Wellman, B. ( 2004). Charting digital divides: Comparing socioeconomic, gender, life stage, and rural-urban internet access and use in five countries. In W. Dutton, B. Kahin, R. O’Callaghan & A. Wyckoff (Eds.), Transforming Enterprise (pp. 467– 497). Cambridge, MA: MIT Press. comScore, Inc. ( 2016). comScore ranks the top 50 U.S. digital media properties for January 2016. comScore. Retrieved June 6, 2017 from http://www.comscore.com/Insights/Rankings/comScore-Ranks-the-Top-50-US-Digital-Media-Properties-for-January-2016 and archived at https://perma.cc/KRD4-YVZ9 Correa, T. ( 2010). The participation divide among “online experts”: Experience, skills and psychological factors as predictors of college students’ web content creation. Journal of Computer-Mediated Communication , 16( 1), 71– 92. doi:10.1111/j.1083-6101.2010.01532.x Google Scholar CrossRef Search ADS van Deursen, A., Helsper, E., Eynon, R., & van Dijk, J. ( 2017). The compoundness and sequentiality of digital inequality. International Journal of Communication , 11, 22. DiMaggio, P., & Bonikowski, B. ( 2008). Make money surfing the web? The impact of Internet use on the earnings of U.S. workers. American Sociological Review , 73( 2), 227– 250. doi:10.1177/000312240807300203 Google Scholar CrossRef Search ADS DiMaggio, P., Hargittai, E., Celeste, C., & Schafer, S. ( 2004). Digital inequality: From unequal access to differentiated use. In K. Neckerman (Ed.), Social Inequality (pp. 355– 400). New York, NY: Russell Sage Foundation. https://www.russellsage.org/research/reports/dimaggio Dutton, W. H., & Blank, G. ( 2014). The emergence of next generation Internet users. International Economics and Economic Policy , 11( 1–2), 29– 47. doi:10.1007/s10368-013-0245-8 Google Scholar CrossRef Search ADS Ettema, J. S., & Kline, F. G. ( 1977). Deficits, differences, and ceilings: Contingent conditions for understanding the knowledge gap. Communication Research , 4( 2), 179– 202. doi:10.1177/009365027700400204 Google Scholar CrossRef Search ADS Ford, H., & Wajcman, J. ( 2017). ‘Anyone can edit’, not everyone does: Wikipedia’s infrastructure and the gender gap. Social Studies of Science , 47( 4), 511– 527. doi: 10.1177/0306312717692172 Google Scholar CrossRef Search ADS PubMed Friedland, L. A., Hove, T., & Rojas, H. ( 2006). The networked public sphere. Javnost—The Public , 13( 4), 5– 26. doi:10.1080/13183222.2006.11008922 Google Scholar CrossRef Search ADS Graells-Garrido, E., Lalmas, M., & Menczer, F. ( 2015). First women, second sex: Gender bias in Wikipedia. In Proceedings of the 26th ACM Conference on Hypertext & Social Media (pp. 165–174). New York, NY: ACM. doi:10.1145/2700171.2791036 Haight, M., Quan-Haase, A., & Corbett, B. A. ( 2014). Revisiting the digital divide in Canada: The impact of demographic factors on access to the internet, level of online activity, and social networking site usage. Information, Communication & Society , 17( 4), 503– 519. doi:10.1080/1369118X.2014.891633 Google Scholar CrossRef Search ADS Hampton, K. N., Lee, C., & Her, E. J. ( 2011). How new media affords network diversity: Direct and mediated access to social capital through participation in local social settings. New Media & Society , 13( 7), 1031– 1049. doi:10.1177/1461444810390342 Google Scholar CrossRef Search ADS Hargittai, E. ( 2002). Second-level digital divide: differences in people’s online skills. First Monday, 7(4). Retrieved January 28, 2018 http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/942/864 and archived at https://perma.cc/996D-T6VL Hargittai, E., & Hsieh, Y. P. ( 2012). Succinct survey measures of web-use skills. Social Science Computer Review , 30( 1), 95– 107. doi:10.5210/fm.v7i4.942 Google Scholar CrossRef Search ADS Hargittai, E., & Jennrich, K. ( 2016). The online participation divide. In L. A. Friedland & M. Lloyd (Eds.), The Communication Crisis in America, and How to Fix It (pp. 199– 213). Palgrave Macmillan. doi:10.1057/978-1-349-94925-0_13 Google Scholar CrossRef Search ADS Hargittai, E., & Shaw, A. ( 2013). Digitally savvy citizenship: The role of Internet skills and engagement in young adults’ political participation around the 2008 presidential election. Journal of Broadcasting & Electronic Media , 57( 2), 115– 134. doi:10.1080/08838151.2013.787079 Google Scholar CrossRef Search ADS Hargittai, E., & Shaw, A. ( 2015). Mind the skills gap: The role of Internet know-how and gender in differentiated contributions to Wikipedia. Information, Communication & Society , 18( 4), 424– 442. doi:10.1080/1369118x.2014.957711 Google Scholar CrossRef Search ADS Hargittai, E., & Walejko, G. ( 2008). The participation divide: Content creation and sharing in the digital age. Information, Communication & Society , 11( 2), 239– 256. doi:10.1080/13691180801946150 Google Scholar CrossRef Search ADS Hill, B. M., & Shaw, A. ( 2013). The Wikipedia gender gap revisited: Characterizing survey response bias with propensity score estimation. PLoS One , 8( 6), e65782. doi:10.1371/journal.pone.0065782 Google Scholar CrossRef Search ADS PubMed Hwang, Y., & Jeong, S.-H. ( 2009). Revisiting the knowledge gap hypothesis: A meta-analysis of thirty-five years of research. Journalism & Mass Communication Quarterly , 86( 3), 513– 532. doi:10.1177/107769900908600304 Google Scholar CrossRef Search ADS Jenkins, H. ( 2006). Convergence culture: Where old and new media collide . New York, NY: New York University Press. Johnson, I. L., Lin, Y., Li, T. J.-J., Hall, A., Halfaker, A., Schöning, J., & Hecht, B. ( 2016). Not at home on the range: Peer production and the urban/rural divide. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (pp. 13–25). New York, NY: ACM. doi:10.1145/2858036.2858123 King, G., Tomz, M., & Wittenberg, J. ( 2000). Making the most of statistical analyses: Improving interpretation and presentation. American Journal of Political Science , 44( 2), 347– 361. doi:10.2307/2669316 Google Scholar CrossRef Search ADS Lam, S. K., Uduwage, A., Dong, Z., Sen, S., Musicant, D. R., Terveen, L., & Riedl, J. ( 2011). WP:clubhouse?: An exploration of Wikipedia’s gender imbalance. In Proceedings of the 7th International Symposium on Wikis and Open Collaboration (pp. 1–10). New York, NY: ACM. doi:10.1145/2038558.2038560 Livingstone, S., & Helsper, E. ( 2007). Gradations in digital inclusion: Children, young people and the digital divide. New Media & Society , 9( 4), 671– 696. doi:10.1177/1461444807080335 Google Scholar CrossRef Search ADS Martínez-Cantos, J. L. ( 2017). Digital skills gaps: A pending subject for gender digital inclusion in the European Union. European Journal of Communication , 32( 5), 419– 438. doi:10.1177/0267323117718464 Google Scholar CrossRef Search ADS National Academy of Sciences, National Academy of Engineering, and Institute of Medicine of the National Academies. ( 2007). Beyond bias and barriers: Fulfilling the potential of women in academic science and engineering . Washington, DC: National Academies Press. NORC. (n.d.). Amerispeak: NORC’s breakthrough panel-based research platform. Retrieved February 5, 2018 from http://www.norc.org/Research/Capabilities/pages/amerispeak.aspx and archived at https://perma.cc/2TBM-KLUJ Ono, H., & Zavodny, M. ( 2016). Internet and gender. In N. Naples (Ed.), The Wiley Blackwell Encyclopedia of Gender and Sexuality Studies (pp. 1–4). John Wiley & Sons, Ltd. http://onlinelibrary.wiley.com/doi/10.1002/9781118663219 Pearce, K. E., & Rice, R. E. ( 2013). Digital divides from access to activities: Comparing mobile and personal computer Internet users. Journal of Communication , 63( 4), 721– 744. doi:10.1111/jcom.12045 Google Scholar CrossRef Search ADS Perrin, A. J., & Vaisey, S. ( 2008). Parallel public spheres: Distance and discourse in letters to the editor. American Journal of Sociology , 114( 3), 781– 810. doi:10.1086/590647 Google Scholar CrossRef Search ADS Pew Research Center. ( 2017). Internet/broadband fact sheet. The Pew Internet & American Life Project. Retrieved February 5, 2018 from http://www.pewinternet.org/fact-sheet/internet-broadband/ and archived at https://perma.cc/LJ8G-FAVQ Reagle, J., & Rhue, L. ( 2011). Gender bias in Wikipedia and Britannica. International Journal of Communication , 5, 21. Robinson, L., Cotten, S. R., Ono, H., Quan-Haase, A., Mesch, G., Chen, W., … Stern, M. J. ( 2015). Digital inequalities and why they matter. Information, Communication & Society , 18( 5), 569– 582. doi:10.1080/1369118X.2015.1012532 Google Scholar CrossRef Search ADS Schradie, J. ( 2011). The digital production gap: The digital divide and Web 2.0 collide. Poetics , 39( 2), 145– 168. https://doi.org/16/j.poetic.2011.02.003 Google Scholar CrossRef Search ADS Schradie, J. ( 2015). The gendered digital production gap: Inequalities of affluence. In L. Robinson, S. R. Cotten, & J. Schulz (Eds.), Communication and information technologies annual ( Vol. 9, pp. 185– 213). Bingley, UK: Emerald Group Publishing Limited. doi:10.1108/S2050-206020150000009008 Google Scholar CrossRef Search ADS Selwyn, N. ( 2004). Reconsidering political and popular understandings of the digital divide. New Media & Society , 6( 3), 341– 362. doi:10.1177/1461444804042519 Google Scholar CrossRef Search ADS Shane-Simpson, C., & Gillespie-Lynch, K. ( 2017). Examining potential mechanisms underlying the Wikipedia gender gap through a collaborative editing task. Computers in Human Behavior , 66, 312– 328. doi:10.1016/j.chb.2016.09.043 Google Scholar CrossRef Search ADS Tichenor, P. J., Donohue, G. A., & Olien, C. N. ( 1970). Mass media flow and differential growth in knowledge. Public Opinion Quarterly , 34( 2), 159– 170. doi:10.1086/267786 Google Scholar CrossRef Search ADS Wagner, C., Graells-Garrido, E., Garcia, D., & Menczer, F. ( 2016). Women through the glass ceiling: Gender asymmetries in Wikipedia. EPJ Data Science , 5( 1). doi:10.1140/epjds/s13688-016-0066-4 Warschauer, M. ( 2003). Technology and social inclusion: Rethinking the digital divide . Cambridge, MA: MIT Press. Wasserman, I. M., & Richmond-Abbott, M. ( 2005). Gender and the Internet: Causes of variation in access, level, and scope of use. Social Science Quarterly , 86( 1), 252– 270. doi:10.1111/j.0038-4941.2005.00301 Google Scholar CrossRef Search ADS Zillien, N., & Hargittai, E. ( 2009). Digital distinction: Status-specific Internet uses. Social Science Quarterly , 90( 2), 274– 291. Google Scholar CrossRef Search ADS © 2018 International Communication Association
Journal of Communication – Oxford University Press
Published: Feb 1, 2018
It’s your single place to instantly
discover and read the research
that matters to you.
Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.
All for just $49/month
Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly
Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.
Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.
Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.
All the latest content is available, no embargo periods.
“Hi guys, I cannot tell you how much I love this resource. Incredible. I really believe you've hit the nail on the head with this site in regards to solving the research-purchase issue.”Daniel C.
“Whoa! It’s like Spotify but for academic articles.”@Phil_Robichaud
“I must say, @deepdyve is a fabulous solution to the independent researcher's problem of #access to #information.”@deepthiw
“My last article couldn't be possible without the platform @deepdyve that makes journal papers cheaper.”@JoseServera