Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Geodata in labor market research: trends, potentials and perspectives

Geodata in labor market research: trends, potentials and perspectives This article shows the potentials of georeferenced data for labor market research. We review developments in the lit- erature and highlight areas that can benefit from exploiting georeferenced data. Moreover, we share our experiences in geocoding administrative employment data including wage and socioeconomic information of almost the entire German workforce between 2000 and 2017. To make the data easily accessible for research, we create 1-square-kilom- eter grid cells aggregating a rich set of labor market characteristics and sociodemographics of unprecedented spatial precision. These unique data provide detailed insights into inner-city distributions for all German cities with more than 100,000 inhabitants. Accordingly, we provide an extensive series of maps in the Additional file 1 and describe Berlin and Munich in greater detail. The small-scale maps reveal substantial differences in various labor market aspects within and across cities. Keywords: Georeferenced data, Microdata, Register-based data, Urban economics, Regional science, Labor economics, Neighborhood effects, Spatial economics, Segregation JEL classification: J12, J31, R12, O18 characteristic of geodata is the assignment of each sta- 1 Introduction tistical identity to an exact location on the Earth’s sur- Today, individual geopositioning is ubiquitous. We use face (Goodchild 2013). Currently, most spatial research detailed georeferenced data (henceforth: geodata) to in economics and sociology uses city district  or county navigate driving routes, track after-work runs, and look aggregates. However, spatially aggregated data face sev- up directions to a new restaurant. Companies profit from eral limitations restricting the investigation of many optimized logistics, agriculture and construction due research questions. In contrast, geodata allow to flexibly to detailed information from orbital satellite systems. scale spatial information independently of administrative Whereas processing and utilizing detailed position data boundaries, resulting in three main advantages: are common in many fields such as engineering and busi - First, greater spatial depth enables the detailed inves- ness administration, these skills have not been a primary tigation of topics such as segregation (Brakman et  al. subject in economics and sociology yet. 2004; Eeckhout et al. 2014; Rosenthal and Strange 2008), This article examines the potential of geodata in the neighborhood effects (Schönwälder and Söhn 2009) and social sciences. Moreover, the article presents multi- mobility (Dauth and Haller 2018, 2020). Second, geo- city evidence on how small-scale geodata can reveal data can serve as a methodological tool. For instance, inner-city developments and inequalities that have been researchers can use geodata for the sampling of surveys hidden by administrative borders so far. The essential or identifying neighborhood boundaries (Lee et al. 2008; Legewie and Schaeffer 2016), spatial shocks or family *Correspondence: kerstin.ostermann@iab.de relations (Goldschmidt et  al. 2017). Third, the potential Friedrich-Alexander University Erlangen-Nürnberg (FAU), Findelgasse 7-9, of enriching existing data with geoinformation opens 90402 Nürnberg, Germany up possibilities for record linkage, e.g., with smartphone Full list of author information is available at the end of the article © The Author(s) 2022, corrected publication 2022. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visithttp:// creat iveco mmons. org/ licen ses/ by/4. 0/. 5 Page 2 of 17 K. Ostermann et al. data (Bähr et  al. 2018) as well as with genuine spatial 2015; Gathmann et  al. 2020). Natural experiments are data, such as satellite imagery (Henderson et  al. 2012) of special interest for labor market research because and climate data (Rüttenauer 2018). they allow to rule out spatial sorting (Combes et  al. This is likely due to the lack of data and the complex - 2008; Haller and Heuermann 2020). Geodata enable ity of processing them (Bayer et al. 2014; Vom Berge et al. researchers to precisely evaluate the effect of regional 2014; Bügelmeyer et al. 2015). However, increasing com- shocks on individuals, subgroups, or entire local labor putational capacities and more suitable statistical tools markets (Desmet and Henderson 2015; Oakes et  al. facilitate research on and with geodata. As a result, the 2015) with much higher precision than regional aggre- number of published studies using geo-data has been gates. One example for such an exogenous shock in rapidly growing and will further increase given the vari- Germany is the refugee inflow in 2015 and 2016. Using ety of advantages geodata offers. geodata, researchers can track refugee residences and In this article, we highlight research potentials of workplaces within cities and can evaluate the integra- geocoded labor market data with descriptive evidence tion process in a more detailed way than with regional from grid cell data as an example. Moreover, we share aggregates. Moreover, flexible scaling enhances the our experience in geocoding the employment biogra- selection of appropriate control regions for matching phies of almost the entire German workforce between processes. 2000 and 2017. In addition to detailed daily information As a further large-scale topic, geodata contribute on employment and unemployment records, the data to insights for city and infrastructure planning which contain exact coordinates of workplaces and places of is connected to the locational choice for institutions, residence. This allows us to describe the German labor firms and workers (Duranton et  al. 2015; Helsley 2004; market with unprecedented spatial precision. Further- Ottaviano and Thisse 2004). To capture metropoli - more, this paper illustrates the potential of geodata by tan effects, Lucas and Rossi-Hansberg (2002) propose visualizing the labor market characteristics of all major an equilibrium city model, which operates under the German cities, of which two, Berlin and Munich, will assumption that people live where they work. Using be discussed in greater detail. We show that small-scale geodata, Dauth and Haller (2018) show that this geodata can reveal substantial differences in fundamental assumption is—at least for Germany—only partially labor market characteristics within and across cities. true. While US cities are mostly monocentric with clear This article is organized as follows: In Sect.  2, we districts for firms, workers and different employment review the recent literature, focusing on research that groups, cities in other, e.g., European, countries might already uses or could benefit from using geodata. Next, be structured differently, which makes it difficult to link in Sect. 3, we share our experiences in geocoding admin- them to existing theoretical and empirical models (Ahl- istrative labor market data. In Sect.  4, we provide small- feldt et al. 2015; Dauth and Haller 2020; Duranton and scale descriptions of two large German cities, Berlin and Puga 2015). Tackling this issue, Ahlfeldt et  al. (2015) Munich. In the final section, we conclude by identifying use geodata in a quantitative theoretical model to esti- potential research areas and questions for the presented mate the dynamics of the internal city structure with data set. Additionally, an extensive online appendix that heterogeneous centers. They build city “blocks” of 500 contains fine-graded maps of labor market characteris - square meter grid cells (“grids”) to control for variation tics for all German cities with more than 100,000 inhabit- in the surroundings. In a second step, they combine ants complements this article. their theoretical model with the natural experiment of the fall of the Berlin Wall and use inner-city variation across grids to provide causal evidence. 2 Potential research topics and trends In addition to regional and city-related topics, geodata in the relevant literature offer advantages on a smaller scale, enabling the detailed In the following section, we provide a short overview of analysis of neighborhood effects. Although the concept potential research fields, starting with questions covering of neighborhoods is quite diverse, research generally dis- larger regional areas and cities before moving towards tinguishes between residential and workplace neighbor- research on neighborhoods and individual mobility. hoods. Although research on workplace neighborhoods Although we present each topic separately, there are vari- can considerably profit from the usage of geodata, we ous dependencies across these research fields. will focus on the research potentials for the literature on One of the most popular approaches to derive causal residential neighborhoods in this article. For the choice inference are “natural experiments” such as political of residence, contextual factors such as the social context, reforms, mass layoffs or sudden economic or natu - quality of life, public goods, and housing costs play an ral developments affecting entire regions (Ager et  al. important role (Dustmann et  al. 2018; Kang et  al. 2020; 2020; Ahlfeldt et  al. 2015; Desmet and Henderson Geodata in labor market research: trends, potentials and perspectives Page 3 of 17 5 Lee et al. 1994). Highlighting the relevance of social net- effects. As indicated in the beginning of this section, works, Jahn and Neugart (2020) find significant job refer - exploiting exogenous events is a popular strategy to ral networks in German neighborhoods using geocoded account for endogenous neighborhood change (Chetty data. and Hendren 2018; Rossi-Hansberg et al. 2010). However, A prominent strand within neighborhood literature such events are rare and often identify local average treat- is the rise and development of segregation (Mossay and ment effects only. The geographically small scale of grid Picard 2019; Reardon and O’Sullivan 2004). Segregated or point data enables other causal estimation techniques subgroups can arise if characteristics are homogeneous based on border distances or grid-cell variation. Exempli- within neighborhoods but heterogeneous between neigh- fying the potential of small-scale data, Bayer et al. (2008) borhoods (Bayer et  al. 2014; Cutler and Glaeser 1997; use block-level variation within a wider neighborhood Graham 2018; Legewie and Schaeffer 2016; Schelling to estimate the causal effect of neighborhood referrals. 1969). Small-scale geodata like grid cells provide a higher Geocoded grid-cell data can easily improve their admin- resolution for segregation patterns and their effects than istrative block approach. Another example is the paper of county- or district-level data enabling not only a more Breidenbach et  al. (2021), who use Berlin grid-cell data fine-grained investigation on the base of grid cells but to estimate the causal effect of flight noise and proxim - also comparisons between grid cells. For instance, vom ity to the airport on housing rental prices. In exploiting Berge et al. (2014) use cross-sectional geocoded German the unexpected delays of the airport closure of Berlin- employment data to visualize the distribution of low- Tegel and inner-city variation in the exposure to flight income individuals for the German cities Berlin, Ham- noise, they show that flight noise reduces rental prices of burg and Munich. Although providing only a snapshot treated neighborhoods by 2 to 5%. for one year, vom Berge et al. (2014) already highlight that Moreover, geodata measure the effects of geographi - Munich and Berlin differ in their segregation patterns. cal distances more precisely than aggregates at higher To investigate the rise of segregation, research can- administrative levels, thereby enhancing the analysis of not solely focus on a static definition of neighborhoods. individual mobility. Although the focus of this article is Neighborhoods are dynamic environments that change grid-cell data and inner-city distributions, individual and evolve over time due to exogenous events or selec- mobility is a field of research with a high potential in the tive individual mobility (Feijten and Van  Ham 2009; usage of geo-data. A broad body of research literature Sharkey and Faber 2014). In general, similar individuals seeks to explain individual (non-)mobility (Arntz 2005; tend to choose neighborhoods with similar character- Chetty and Hendren 2018; Kennan and Walker 2011; istics to their own (Durlauf 2004; Feijten and Van  Ham Lee et  al. 1994; Reichelt and Abraham 2017; Sorenson 2009; Kremer 1997). Summing this selective residential and Dahl 2016) and commuting (Dauth and Haller 2020). choice up to a selective subgroup inflow on the aggre - However, most of these analyses measure regional mobil- gate level, neighborhoods might “tip”: The emerging ity as moving from one county or region to another, subgroup drives minorities out of the neighborhood, resulting in a bias for individuals living close to a bor- causing endogenous mobility and segregation (Durlauf der or moving within a district (Lee et  al. 1994). Using 2004; Schelling 1969, 1971). Such segregated neighbor- geodata, mobility is now a continuous variable instead hoods can cause neighborhood conflicts, especially of a binary indicator that facilitates advanced estima- if neighborhood boundaries are contested (Legewie tion methods in mobility research (Dauth and Haller and Schaeffer 2016). For dynamic analyses of segrega - 2020). Currie et  al. (2010), e.g., show that the distance tion developments, trend- or panel-data are necessary. to fast food restaurants in miles correlates with the indi- The investigation of dynamic compositional changes vidual’s weight gain. Card (1993) uses college proximity is especially relevant for high-density neighborhoods as an instrument when examining the returns to school- where housing alternatives are rare, particularly under ing among young males in the US. Additionally, geo- the assumption that land and its users are heterogene- data researchers can either consider the initial position ous (Card et  al. 2008; Duranton and Puga 2015; Helsley within an administrative unit explicitly or can neglect it 2004). As tight living conditions are most evident for completely. larger cities, we focus on those in this article. Taken together, the review demonstrates that geo- In addition to promoting descriptive research on seg- data improve a wide range of possible research top- regation patterns and processes, geodata also offer new ics and methods. First, geodata enable a more precise possibilities for the causal estimation of neighborhood measurement of regional shocks and their effects. Sec - ond, geodata supersede the reliance on simplified city Even though we are going to present trend data in this article, the underly- or neighborhood models without relying on assump- ing grid cell data provide also information on neighborhood in- and outflows. tions about the distribution of productivity, income and 5 Page 4 of 17 K. Ostermann et al. socioeconomic characteristics within districts. Third, result in lines, and multiple lines lead to a geometric geodata enhance mobility research opening up a new object called a polygon. The latter can be an administra - scope of social science research. tive unit on which data are spatially aggregated. However, independence from these administrative units is the most 3 A case study of geocoding striking asset of geodata. Therefore, the final geocoded Even though some studies already use grid cell data to IEB store point data. investigate city developments, neighborhood composi- In previous years, the Institute for Employment tion or individual mobility (Ahlfeldt et al. 2015; Jahn and Research (IAB) gained some experience with geocoding Neugart 2020; Vom  Berge et  al. 2014), there is no avail- data sets: The first attempt was a sample of three due able data set containing longitudinal and comprehensive dates in 2007 to 2009 (Scholz et  al. 2012), followed by labor market information on grid-cell level for a whole the processing of the address histories of establishments, country as Germany. To provide such a data set, we geo- employees, and clients of job centers for the years 2000- coded administrative labor market data from Germany. 2014 (Dauth and Haller 2018). The last reviewed version In the following, we will shortly describe the characteris- from 2019 contains the years 2000 to 2017 and all avail- tics of the Integrated Employment Biographies (IEB), the able address histories, called IEB GEO. This data set is base of the data set used. Moreover, we give insight into a supplement to the IEB as well as to all other IAB data the process of geocoding these particular data. sets and samples that are connected to the register data, such as the IAB Establishment Panel (EP) , the IAB Job 3.1 I ntroduction to German administrative labor market Vacancy Survey (JVS) , the Panel Study “Labour Market data and Social Security” (PASS) and the IAB-BAMF-SOEP The IEB contain register-based information about indi - Survey of Refugees . viduals who are employed (data available since 1975) The IAB met several challenges to improve the future or receive benefits according to the German Social quality of references and shorten production time before Code (SGB). The IEB further include data of individuals the addresses of the IEB can be transformed to geoco- searching for a job or receiving vocational guidance (data des: One main challenge is that some addresses change available since 2000) as clients of the German Federal over time because of new postcodes and new names of Employment Agency (BA) or the local job centers. The municipalities or streets. The used geocoding tool from IEB also contain information on individuals participating infas360 refers to one single timestamp, in this case, to in programs of active labor market policies (data available the end of 2017. Therefore, some historical information since 2000). do not match the new notation, leading to inexact geo- The spatial information in the base IEB was limited references. In this case, we use technical links provided to separate units of municipalities and areas referring by the statistical Datawarehouse of the IAB. Usually, the to administrative offices (“Arbeitsagenturen”) or local Datawarehouse processes addresses into an identifier of job centers. These units are not constant and underlie a spatial unit, which is the common area of the postcode, continuous changes due to fusions of political units or community, Federal Employment Agency, and job center new layouts of local labor markets. Since the late 1990s, (statistical place identifier) . If the units or unit names the IEB include not only the workplace or the agency change, the linking document changes from an address that delivers benefits but also the residence of the indi - to another statistical place or official name over time. viduals or the benefit units (“Bedarfsgemeinschaften”). Using this database, we add the new address notations for Since 2000, this information has been based on mailing postcodes and names of municipalities to the pool of all addresses. Time stamps are exact to the day when a new address is registered. https:// www. iab. de/ en/ erheb ungen/ iab- betri ebspa nel. aspx. 3.2 Geocoding https:// www. iab. de/ en/ befra gungen/ stell enang ebot. aspx. In the following, we describe the process used to trans- https:// fdz. iab. de/ en/ FDZ_ Indiv idual_ Data/ PASS. aspx. form mail-exact address data from the IEB into geodata. 6 https:// fdz. iab. de/ en/ FDZ_ Indiv idual_ Data/ iab- bamf- soep/ IAB- BAMF- The characteristic feature of geodata is the efficient stor -SOEP- SUF16 17v1. aspx. age of address information in points, lines or polygons. https:// www. infas 360. de/ geoko dieru ng/. The statistical department of the Federal Employment Agency provides Each point contains two dimensions: the longitude on an overview of the different regional classifications https:// stati stik. arbei the x-axis and the latitude on the y-axis. Various points tsage ntur. de/ DE/ Navig ation/ Grund lagen/ Klass ifika tionen/ Regio naleG liede rungen/ Regio naleG liede rungen- Nav. html.; see especially the combinations https://statistik.arbeitsagentur.de/DE/Statischer-Content/Grundlagen/Klas- sifikationen/Regionale-Gliederungen/Generische-Publikationen/Zusam- For more detailed information, see Jacobebbinghaus and Seth (2007). menhang-Gebietsgliederungen.xlsx?__blob=publicationFile&v=4. Geodata in labor market research: trends, potentials and perspectives Page 5 of 17 5 historical addresses. For streets, no links were available not publicly available. Address information in connection until now, so gaps in the exact geocodes remain in this with any social security information is highly secured case. and only available to the geocoding team. The juridical Another issue is the implementation of address histo- department of the IAB grants restricted access to IAB ries at different times with different standards. To solve staff after a detailed description of the project. The IAB this issue, we create a unique format that conforms with follows strict data protection measures as a matter of the geocoder tool and separates the house number from course. the street name. The geocoding tool is less successful To meet the data protection guidelines, we designed in the case of several house numbers for one address the IEB GEO as a system of several data sets with differ - (which is quite common for addresses of establishments), ent sensitivity and access modes: The five histories con- prompting the use of only the first number (e.g., instead tain only an anonymous Geo-ID along with anonymized of “Hauptstraße 100–104”, we refer to “Hauptstraße, identifiers of persons, establishments or SGB-II-benefit 100”). Therefore, the coding quality for these addresses units, begin-/enddate with some variables describing the is less exact but without any missing house number quality and two markers of moves between addresses. information. Especially in the first years of the address A second data set contains information on the relation histories, the address notation is poor due to shorten- between the point-ID and six available anonymous grid- 2 2 2 2 ing, typing or transmission errors. Therefore, we replace cell-IDs 100 m (100 m , 500 m , and 1000 m -grids in common or known notations with new standards. We Lambert projection (LAEA) and Universal Transversal also detect anonymous addresses such as lock boxes or Mercator-Projection Zone 32 (UTM32)). Seven sepa- refuges for battered women and set them to “missing” to rated data sets contain the official codes and two addi - protect secure personal information. tional projection systems (Gauß-Krüger-Projection and To georeference the addresses, we use the commercial World Geodetic System 1984), and the last data set links tool of infas360. Unfortunately, the matching algorithms the identifiers of the IEB to those of the IEB GEO. are business secrets and are therefore not available for To comply with the GDPR, the design of the IEB GEO scientific documentation or for developing another data is available at different levels of anonymization according preparation process. However, we derive some major to the scientific purpose. For some analyses, anonymous principles and adjusted the processing accordingly. For geogrid identifier are sufficient. In other cases, users can example, the geocode quality is worse in some cases if compute distances with remote data access. If necessary, postcode and municipality name do not match. There - users have to apply for geocodes or grid codes in different fore, we geocode cases with minor results a second time granularities to combine the IEB GEO with other geodata without the postcode and include the geocode with the or points of interest or, as in the example below, to pro- best quality. When the tool returns two codes belonging duce maps of labor market characteristics in 1 × 1 kilo- to different municipalities, we exclude these cases from meter grid cells illustrating the labor market structure of further processing. cities. 3.3 IEB GEO 4 Results: labor market characteristics of selected In total, the address histories used include 420 million cities data rows with approximately 80 million different address Having explained our experiences with geocoding notations. We pool these data as 43 million standardized social security data, the following section shows labor notations with the geocoder tool returning 19 million market insights and developments on a fine scale ena - geocodes. To keep the processing time manageable, we bling analyses within and irrespective to administrative used two georeferencing processes in parallel. One geoc- boundaries. We illustrate the potential of such data by oding passage ultimately lasted three days. The different investigating various inner-city labor market character- measures of standardization therefore not only improved istics. Based on a series of maps, we describe the spatial the data quality but also shortened the workflow. The distribution of workplaces, residencies, wages, employ- quality of georeferences differs among the sources and ment types, and skills. All maps are based on the full increases over time. On average, approximately 95% of the geocodes are exact mailing addresses, making a strong base for further analyses. As a variable of register data, the exact workplace or Referring to (a) the place of establishments, place of residence of (b) employ- residence is highly sensitive information in terms of the ees, (c) clients of the Federal Employment Agency and d) job center-clients of authorized municipalities that deliver data via the transmission standard German General Data Protection Regulation (GDPR). XSozial-BA-SGB II, and the place of residence of e) benefit units following §7 Due to the high sensitivity of the data, the IEB GEO is SGB II. 5 Page 6 of 17 K. Ostermann et al. Fig. 1 Employment density. The figure shows the number of workers in 1 × 1 kilometer grid cells in Berlin (upper panel, 759 grids) and Munich (bottom panel, 289 grids) in 2000, 2010 and 2017. Light purple cells indicate a low number of workers, and dark purple cells indicate a high number. We fixed the color scale for each feature so that it approximately ranges from the first to the ninth decile in all cities with more than 100,000 inhabitants. The data base of the maps is social security data from the IAB, even though we exclude chain-store industries from the workplace data. For data protection reasons, we removed cells with fewer than 20 residents. Blue areas in the background represent water; green areas, forests; light yellow areas, settlements; solid gray lines, roads; and dashed gray lines, railroads IEB GEO and visualize the distribution of labor market 4.1 Employment and residential density characteristics in 1 × 1 kilometer grid cells. Figures 1 and 2 illustrate the employment and residential For data protection reasons, we censored cells with density in Berlin and Munich. To measure employment fewer than 20 residents or, in case of the employment density, we count all workers in their workplace grid cell. density, with fewer than four establishments. We refer German firms have to register at least one of their estab - readers to the extensive online supplement, which con- lishments per municipality and industry by law, which tains more than 2000 maps for all German cities with makes workplace information highly reliable in general. over 100,000 inhabitants. These maps show that many However, firms that operate several establishments in a German cities differ substantially in their shape from a municipality within the same industry are only obliged to monocentric city structure. The general shape of Düs - register one of them. In such cases, it cannot be guaran- seldorf, for instance, (pp. 53–55), follows the form of a teed that individuals work in the grid they are registered. left-faced arc, whereas the shape of Bremen (pp. 29–31) To prevent errors, we follow Dauth and Haller (2020) and follows the large river Weser from east to west. How- exclude the following chain-store industries from the ever, this study focuses on two of the largest cities in workplace data: construction, financial intermediation, Germany: Berlin and Munich. These cities are interest - public service, retail trade, temporary agency work and ing subjects because they exhibit diametrically different transportation. The exclusion of chain store industries histories and infrastructure. leads to slightly underestimated employment densities. Geodata in labor market research: trends, potentials and perspectives Page 7 of 17 5 Fig. 2 Residential density. The figure illustrates the number of residents in 1 × 1 kilometer grid cells in (upper panel, 759 grids) and Munich (bottom panel, 289 grids) in 2000, 2010 and 2017 Light purple cells indicate a low number of residents, and dark purple cells indicate a high number. We fixed the color scale for each feature so that it approximately ranges from the first to the ninth decile in all cities with more than 100,000 inhabitants. The data base of the maps is social security data from the IAB. For data protection reasons, we removed cells with fewer than 20 residents. Blue areas in the background represent water; green areas, forests; light yellow areas, settlements; solid gray lines, roads; and dashed gray lines, railroads The map for Berlin (Fig.  1, upper panel) indicates a about self-employed individuals, civil servants, stu- loose employment agglomeration towards the city dents, retirees, pure homemakers or children. center in 2017. However, some extensions reach out Figure  2 shows the residential density in the two cit- towards the peripheries highlighting the importance ies. The distribution of residents is scattered over the of alternative agglomeration models like the model of different districts of Berlin, creating a multicentric city - Ahlfeldt et  al. (2015). Employment density has grown scape. While still appearing slightly more concentrated over the years in Berlin and shifted from a slight ten- in the west, the population density shifted, similar to the dency to the west towards the city center. employment density, towards the geographical center of In the bottom panel of Fig. 1, the employment density Berlin over time. in Munich shows an increasing agglomeration towards In Munich, the population density is slightly more con- the city center. The few extensions in certain regions centrated in the southern part of the city. It shows steady around the city might be caused by plants of large firms growth, exceeding the threshold of 3000 inhabitants in around the belt of Munich. most of the grids in 2017. This high density confirms pre - To measure the residential density, we counted all vious findings, which show that Munich is the city with individuals in their grid of residence. Due to the ori- the highest population density in Germany (Statistisches gin of the data, the data only include individuals in the Bundesamt 2019). German social security system, such as employees, reg- In both of the displayed cities, the employment den- istered unemployed individuals, individuals in labor sity shows a radiating pattern that is likely to correlate market programs, and recipients of unemployment with the main transportation routes of each city. The benefits. Therefore, the data do not provide information residential density seems to be more centered in Munich, 5 Page 8 of 17 K. Ostermann et al. Fig. 3 Median daily wage. The figure presents the median daily wage in 1 × 1 kilometer grid cells in Berlin (upper panel, 759 grids) and Munich (bottom panel, 289 grids) in 2000, 2010 and 2017. Light purple cells indicate low levels of the median daily wage, and dark purple cells indicate high levels. We fixed the color scale for each feature so that it approximately ranges from the first to the ninth decile in all cities with more than 100,000 inhabitants. The data base of the maps is social security data from the IAB. For data protection reasons, we removed cells with fewer than 20 residents. Blue areas in the background represent water; green areas, forests; light yellow areas, settlements; solid gray lines, roads; and dashed gray lines, railroads whereas Berlin is more multicentric, showing diversity wage illustrate between-neighborhood inequality and the in districts. Additionally, there seems to be a agglomera- Gini coefficient visualizes within-neighborhood inequal - tion trend over time in employment as well as residential ity. If all wages within a grid cell were equal, the Gini density. coefficient would be zero. If one inhabitant earns all, the Gini would be equal to 1. The wage information in the 4.2 Wages register data is highly reliable in general because employ- Figures 3 and 4 show the median daily wages of residents ers are legally obliged to report wages. However, as typi- and the Gini coefficients in Berlin and Munich. We use cal for social security data, earnings are right-censored at both variables as measures for wage segregation and ine- the social security threshold, which affects approximately quality in neighborhoods. The maps for the median daily 10% of the German workforce. We impute top-coded wages using a two-stage procedure similar to Dustmann et  al. (2009) and Card et  al. (2013) before computing median wages and Gini coefficients. Munich and Berlin are only examples of German cities. Other cities show The concentration of high wages in Berlin (Fig.  3, upper different unusual patterns. For example, the density of residents in Dresden panel) is even more multicentric than the distribution of (maps on pp. 47–49 in the online appendix) is shaped as two diagonal lines employment and residential density. In 2017, multiple across the River Elbe rather than a clear city center concentration, giving geo- graphical conditions a decisive role. Geodata in labor market research: trends, potentials and perspectives Page 9 of 17 5 Fig. 4 Gini coefficient of daily wage. The figure shows the Gini coefficient of daily wages in 1 × 1 kilometer grid cells in Berlin (upper panel, 759 grids) and Munich (bottom panel, 289 grids) in 2000, 2010 and 2017. Cells with light purple color indicate low Gini, and dark purple cells indicate high Gini. The color scale is fixed for each feature and approximately ranges from the first to the ninth decile in all cities with more than 100,000 inhabitants. The data base of the maps is social security data from the IAB. For data protection reasons, we removed cells with fewer than 20 residents. Blue areas in the background represent water, green areas forests, light-yellow areas settlements, solid gray lines roads and dashed gray lines railroads high-wage centers spread across the north, southwest, within neighborhoods in 2010 and 2017, with a Gini of southeast and the center of Berlin. The median wage over 0.45. This was not always the case: the prevalent is the highest and most equally spread in 2000 before segregation occurred sometime between 2000 and 2010, declining and agglomerating over time with no clear vis- with a sharp incline in inequality between the former ually detectable pattern. Adding a dynamic perspective West and the former GDR. This pattern and develop - to the cross-sectional findings of vom Berge et al. (2014), ment can have several reasons, ranging from political we do see an increasing income segregation within larger (the major social security reform in 2005) or economic neighborhood clusters across the city since 2010. reasons (global finance crisis in 2008) to segregation Munich (Fig.  3, bottom panel) has a persistently high processes and private infrastructure investments. As the level in the median wages. Slightly smaller median wages maps on low-paid workers of vom Berge et al. (2014) do are only temporarily evident for 2010. However, the only not show such a sharp division along the former border small percentage of lower median income grids on the in 2009, the inner-city distribution of low-paid workers periphery in 2017 indicates that the city had recovered does not solely drive this pattern. In fact, the relation of from this situation. low-paid workers to high-paid workers seems to differ The Gini coefficient draws a completely different pic - systematically between the former West and the former ture (Fig.  4). In the maps of Berlin (upper panel), the GDR. A comparison with other German cities of the for- city is clearly divided along the former border of the mer GDR indicates that the low Gini coefficient in East West and the German Democratic Republic (GDR), with Berlin in 2017 might be a feature of East German cities: the western part showing noticeably higher inequality Although, e.g., Chemnitz (p. 36 in the Additional file  1), 5 Page 10 of 17 K. Ostermann et al. Fig. 5 Share of regular employed among all employed. The figure depicts the share of regularly employed workers among all workers in 1 × 1 kilometer grid cells in Berlin (upper panel, 759 grids) and Munich (bottom panel, 289 grids) in 2000, 2010 and 2017. Light purple cells indicate low shares of regular employed, and dark purple cells indicate high shares. We fixed the color scale for each feature so that it approximately ranges from the first to the ninth decile in all cities with more than 100,000 inhabitants. The data base of the maps is social security data from the IAB. For data protection reasons, we removed cells with fewer than 20 residents. Blue areas in the background represent water; green areas, forests; light yellow areas, settlements; solid gray lines, roads; and dashed gray lines, railroads Dresden (p.48 in the Additional file  1), Leipzig (p. 132 in strongly between the two cities. Berlin has little inequal- the Additional file  1) and Magdeburg (p.144 in the Addi- ity within neighborhoods in a large part of the city and tional file  1) show a slightly higher Gini coefficients than high inequality in the southwestern part, dividing the city East Berlin in 2010, the inequality within neighborhoods into two parts. In contrast, Munich has a high inequality is remarkably low in all  of those cities in 2017. As we across large parts of the city. Additionally, median wages are only providing visual and non-systematic evidence, are steadily high in Munich indicating low inequality future research should examine the potential reasons of between neighborhoods. Conversely, wages in Berlin are this specific pattern in East German cities more precisely distributed heterogeneously across the city, again creat- by using appropriate statistical models and a full observa- ing a multicentric picture of segregated neighborhood tion period of 18 years instead of 3-year snapshots. clusters. The comparison of the two cities stresses that Wage inequality in Munich follows the pattern of the inequality within and between neighborhoods can differ median wages, with increasing inequality from 2000 substantially from each other highlighting the impor- to 2010 and a slight recovery as of 2017 (Fig.  4, bottom tance of different measures and levels of segregation. panel). However, inequality within neighborhoods is, in contrast to the median wage distribution, higher in cer-4.3 Employment types tain parts of the city belt. This subsection sheds further light on employment and Although the wage inequality for both cities seems non-employment using the residential information to be highest in 2010 indicating a non-linear trend, the of the IEB GEO. Figure  5 depicts the share of regularly inner-city distribution of the wage inequality differs employed individuals who are subject to social insurance Geodata in labor market research: trends, potentials and perspectives Page 11 of 17 5 Fig. 6 Share of non-employed. The figure illustrates the share of unemployed individuals among all residents in 1 × 1 kilometer grid cells in Berlin (upper panel, 759 grids) and Munich (bottom panel, 289 grids) in 2000, 2010 and 2017. Light purple cells indicate low shares of unemployed, and dark purple cells indicate high shares. We fixed the color scale for each feature so that it approximately ranges from the first to the ninth decile in all cities with more than 100,000 inhabitants. The data base of the maps is social security data from the IAB. For data protection reasons, we removed cells with fewer than 20 residents. Blue areas in the background represent water; green areas, forests; light yellow areas, settlements; solid gray lines, roads; and dashed gray lines, railroads among all employed individuals in Berlin and Munich. exceptions. This image has not changed substantially in Figure  6 displays the share of non-working individuals recent decades other than a marginal decrease in 2010. (henceforth unemployed individuals) among all indi- The distribution of unemployment draws a differ - viduals in our data. We define unemployed individuals as ent picture (Fig.  6). Whereas the share of unemployed individuals who are registered unemployed, recipients of was generally high in 2000, it decreased in Berlin over social security benefits, or those who participate in labor the years. It is equally low across entire Berlin in 2017. market measurements and do not have a parallel employ- The same decrease in unemployed individuals applies ment spell. to Munich but at a different starting level. The share of In Berlin (Fig.  5, upper panel), the distribution of reg- unemployed individuals is overall low to nonexistent ularly employed individuals is relatively even in 2017. across the entire city and peripheries. However, the division between East and West Berlin is Employment development in both cities shows decreas- clearly visible, as the eastern area has a higher share of ing unemployment, which is in agreement with the regular employment. The segregation trend is also trace - nationally declining number of unemployed individuals able in the employment status: the equally distributed in Germany, especially since the social assistance (SGB share of regularly employed individuals in 2000 evolves II) reforms in 2005 (Bundesagentur für Arbeit 2020). The into a more segregated inner-city distribution in 2010 share of unemployed individuals in Berlin is higher than and 2017. that in Munich. In both cities, unemployment is almost In Munich (Fig.  5, bottom panel), regularly employed equally distributed, with a few exceptions of high-unem- individuals are equally distributed with only a few ployment grids. Whereas Berlin is more divided into two 5 Page 12 of 17 K. Ostermann et al. areas, the distribution of regular employment relation- “large cities disproportionately attract both high- and low ships in Munich appears to be more equal. skilled workers, while average skills are constant across city size”. The share of low-skilled workers is slightly 4.4 Skills higher and almost evenly distributed over the city, with A final series of maps illustrates the distribution of high-, a slightly higher concentration on the northeastern side. medium- and low-skilled residents in Berlin and Munich. The shares of medium- and low-skilled workers decline In the definition of skill levels, we follow the common over the years and are substituted by the increasing share classification in labor economics: low-skilled residents of high-skilled individuals. are individuals without vocational training, medium- What strikes attention is that in both cites, despite skilled residents are individuals who had completed their distinct differences in structure and centers, high- vocational training, and high-skilled residents are indi- and medium-skilled individuals are segregated. The resi - viduals with a degree from a university or university of dence choice of low-skilled individuals follows a different applied science. Figures 7 and 8 present the geographical pattern. We find a similar pattern of residence segrega - distribution of these three groups in Berlin and Munich tion by skill level for, e.g., Cologne (German “Köln”, pp. in 2000, 2010 and 2017. 125–127 in Additional file  1) and Leipzig (pp. 131–133 in Berlin (Fig.  7) shows a diverse distribution of skills at Additional file 1). first sight. A closer look reveals an agglomeration of high- Overall, Munich and Berlin differ from each other in skilled workers around the center and the southwestern various labor market characteristics. Berlin has a rather side of the city in 2017. In contrast, a lower share of high- multicentric structure, which might be driven by his- skilled workers reside in the northwestern part where torical reasons or sheer size. Furthermore, many char- the flight corridor of Berlin-Tegel is located. The lower acteristics show a clear East-West division as the former representation of high-skilled individuals in the north- separation of the city seems to still play a decisive role western part of the city indicates a correlation between in the agglomeration of the workforce. Munich, alterna- airport noise and skill-level. Using our new grid data on tively, appears more centered and shows a less diverse labor market characteristics, researchers can estimate the picture of labor market characteristics. Having already causal effect of airport noise on labor market outcomes detected several inner-city patterns in both cities, we in exploiting the unexpected delays similar to the strategy also stress the necessity to explain and understand these of Breidenbach et al. (2021) for rental prices. patterns in using more years and additional data. In this Strengthening this research potential, areas with a aspect, future research should exploit the possibility of high share of high-skilled residents are the exact areas combining these labor market data with other geodata. in which the share of medium-skilled workers is notice- ably low. The share of low-skilled workers does not match 5 Discussion and conclusions this segregated picture but has a segregation of its own: It Geodata are one of the furthest-reaching developments is clearly divided between the former East-West border, for regional and urban economics. Nevertheless, the lit- but with its highest share in the northwestern part of the erature that uses geodata is still comparatively small. This city where the flight corridor of the Berlin-Tegel airport article provides an overview of research areas that profit is located. While the share and trend of agglomeration from and already use geocoded data. Geodata enrich of medium- and high-skilled workers increased over the analyses on the regional scale and further provide insight years, the share of low-skilled workers decreased from into spatial relationships on the city or individual scale. 2000 to 2017, with lasting East-West segregation. To foster the usage of geodata, we share our experi- Munich (Fig.  8), in contrast, again shows less diver- ences in generating and preparing employment and labor sity. In 2017, the skill distribution of the entire city has market data at the IAB. The resulting data set IEB GEO an exhaustive share of at least 35% high-skilled workers. contains georeferenced and register-based information This number increased steadily in size and across the on all individuals who were subject to the German social city from 2000 onward, forming the largest skill share security system from 2000 to 2017. These linkable data in 2017. This trend to a higher share of high-skilled provide 350 million consolidated episodes with 19 mil- individuals might be driven by a German-wide trend of lion different geocodes, of which 95% are on the level of increasing shares of high-skilled workers over the years. exact mailing addresses. The small-scale, rich, and highly Alternatively, a city-specific reason might be the high reliable information make the IEB GEO a worldwide rent and cost of living in the city (Kholodilin and Mense unique and high-potential data set. 2012). The share of medium-skilled workers in Munich To illustrate the potential of the IEB GEO, the Addi- is contrarily small, especially in the city center, match- tional file  1 provides maps of all German cities with ing the findings of Eeckhout et  al. (2014,  p. 555) that more than 100,000 inhabitants. Every map displays the Geodata in labor market research: trends, potentials and perspectives Page 13 of 17 5 Fig. 7 Skills in Berlin. The figure shows the share of high-skilled (top layer), medium-skilled (middle layer) and low-skilled individuals (bottom layer) among all residents in our data in 1 × 1 kilometer grid cells in Berlin (759 grids) in 2000, 2010 and 2017. Light purple cells indicate low shares, and dark purple cells indicate high shares. We fixed the color scale for each feature so that it approximately ranges from the first to the ninth decile in all cities with more than 100,000 inhabitants. The data base of the maps is social security data from the IAB. For data protection reasons, we removed cells with fewer than 20 residents. Blue areas in the background represent water; green areas, forests; light yellow areas, settlements; solid gray lines, roads; and dashed gray lines, railroads 5 Page 14 of 17 K. Ostermann et al. Fig. 8 Skills in Munich. The figure presents the share of high-skilled (top layer), medium-skilled (middle layer) and low-skilled individuals (bottom layer) among all residents in our data in 1 × 1 kilometer grid cells in Munich (289 grids) in 2000, 2010 and 2017. Light purple cells indicate low shares, and dark purple cells indicate high shares. We fixed the color scale for each feature so that it approximately ranges from the first to the ninth decile in all cities with more than 100,000 inhabitants. The data base of the maps is social security data from the IAB. For data protection reasons, we removed cells with fewer than 20 residents. Blue areas in the background represent water; green areas, forests; light yellow areas, settlements; solid gray lines, roads; and dashed gray lines, railroads inner-city distribution of one labor market indicator on the employment and resident density, the distribution a 1 × 1 kilometer grid-cell level (e.g., wages, unemploy- of wages, employment status and skills. Whereas Berlin ment and skills). This article exemplarily describes the shows a multicentric pattern in the median daily wages, cities Berlin and Munich in greater detail. We observe the former division of East and West Germany is vis- large differences within and across these two cities in ible in wage inequality as well as in the share of regularly Geodata in labor market research: trends, potentials and perspectives Page 15 of 17 5 employed and low-skilled individuals. In contrast, descriptive evidence for all large cities in Germany. By Munich is more centered and shows a less diverse inner- sharing experiences on the implementation and prepa- city distribution. The descriptive results highlight the ration of geodata as well as examples of visualization, we need for further research using geodata to identify deter- encourage the social sciences community to exploit the minants of inner-city developments. potential of these new data. From a broader perspective, many German cities have not developed monocentrically, as traditional city equi- Supplementary Information librium models assume. Therefore, we emphasize the The online version contains supplementary material available at https:// doi. org/ 10. 1186/ s12651- 022- 00310-x. importance of alternative theoretical models such as that of Ahlfeldt et al. (2015). Our data at hand allows to iden- Additional file 1. Online appendix containing maps for all German cities tify the dynamics of agglomeration effects with higher with more than 100,000 inhabitants for theyears 2000, 2010 and 2017. The temporal frequency. Hence, future research can deter- maps visualize the inner-city distribution of the residential density, the employment density, the median wages, the gini-coefficient, the share of mine spatial equilibrium models with more precision. regular employed and unemployed as well as the share of low-, medium- In addition, our maps highlight the high prevalence of and high-skilled residents. segregation in Germany. We often find visible patterns of increasing segregation between larger neighborhood Acknowledgements clusters by median daily wage especially for cities in the The authors thank two anonymous referees, Philipp Breidenbach, Wolfgang Dauth, Malte Reichelt and Sandra Schaffner for many helpful comments and eastern part of Germany like Dresden and Leipzig, or suggestions. Moreover, we thank Sebastian Bähr and Konstantin Körner for in the Ruhr-region like Bochum and Bottrop. However, their help in substantially revising the grid cell data. We thank two anonymous we also find examples of decreasing (e.g., Hamburg and referees and the editors of the Journal for Labour Market Research for helpful comments. We also thank Elisabeth Roß, Haika Otholt, Petra Prietz and Barbara Cologne) or constant (e.g., Bonn or Mainz) segregation Wünsche for excellent legal advice on data privacy. that underlines the necessity of investigating these differ - ent trends over time more comprehensively. Author contributions All the authors have read and approved the final manuscript. The approach used in this study has some limitations. We only reported exemplary and descriptive evidence for Funding three separate years and two cities. Although we hint at We gratefully acknowledge financial support from the Wissenschaftsgemein- schaft Gottfried Wilhelm Leibniz e.V. Competition (K165/2018/Segregation reasons and developments, inference about (causal) rela- and regional mobility). Kerstin Ostermann acknowledges financial support tionships of the visualized distributions and their changes from the graduate program of the IAB and the Friedrich-Alexander University over time is beyond the scope of this study. However, Erlangen-Nürnberg (GradAB). The funding did not influence the design of the study, analysis, and interpretation of data. the detected patterns and differences within and across the two cities Berlin and Munich provide high-potential Availability of data and materials starting points for relevant research topics using the full The datasets analysed during the current study are not publicly available as the authors use administrative data of the Institute for Employment Research. panel data of the IEB GEO. The data are social data with administrative origin which are processed and A rather minor data limitation of the IEB GEO is that kept by Institute for Employment Research (IAB) according to Social Code III. it relies on social security data only. Therefore, the IEB There are certain legal restrictions due to the protection of data privacy. The data contain sensitive information and therefore are subject to the confiden- GEO provide no information about self-employed, civil tiality regulations of the German Social Code (Book I, Section 35, Paragraph servants, students, children or pure homemakers. Future 1). The data are held by the IAB (email: iab@iab.de, phone: +49 911 1790) and research can partly solve this issue by spatially merging are on-site available on reasonable request. The code is available and archived at the Research Data Centre of the IAB; see https:// iab. de/ en/ daten/ repli katio the IEB GEO to other geodata, which combination was nen. aspx for further information. The authors are willing to assist (Kerstin previously restricted to the county level for analyses with Ostermann, kerstin.ostermann@iab.de). the IEB. With data such as the IEB GEO, future research should Declarations analyze various topics of social sciences, as the examples Ethics approval and consent to participate in Sects. 2 and 4 have shown. By exploiting the advan- Not applicable. tages of geodata, research can provide more fine-scaled, causal evidence for the impact of regional shocks on Consent for publication Not applicable. neighborhood effects and individual distance thresh - olds. Overall, this study shows the potential and perspec- Competing interests tives of the usage of geodata enriched by comprehensive The authors declare that they have no competing interests Author details Institute for Employment Research (IAB), Regensburger Str. 100, 90478 Nürn- For an overview on geodata for Germany, visit the website of the RWI, 2 berg, Germany. Friedrich-Alexander University Erlangen-Nürnberg (FAU), h t t p s : / / w w w . r w i - e s s e n . d e / e n / f o r s c h u n g - b e r a t u n g / w e i t e r e / f o r s c h u n g s d a t e n zentr um- ruhr/ daten angeb ot. 5 Page 16 of 17 K. Ostermann et al. Findelgasse 7-9, 90402 Nürnberg, Germany. German Institute for Economic Dustmann, C., Fitzenberger, B., Zimmermann, M.: Housing expenditures and Research (DIW ), Mohrenstr. 58, 10117 Berlin, Germany. income inequality. ZEW-Centre for European Economic Research Discus- sion Paper 18-048 (2018) Received: 12 October 2021 Accepted: 13 April 2022 Dustmann, C., Ludsteck, J., Schönberg, U.: Revisiting the German wage struc- Published: 3 June 2022 ture. Q. J. Econ. 124(2), 843–881 (2009) Eeckhout, J., Pinheiro, R., Schmidheiny, K.: Spatial sorting. J. Polit Econ 122(3), 554–620 (2014) Feijten, P., Van Ham, M.: Neighbourhood change... reason to leave? Urban Stud. 46(10), 2103–2122 (2009) References Gathmann, C., Helm, I., Schönberg, U.: Spillover effects of mass layoffs. J. Eur. Ager, P., Eriksson, K., Hansen, C.W., Lønstrup, L.: How the 1906 San Francisco Econ. Assoc. 18(1), 427–468 (2020) earthquake shaped economic activity in the American West. Explor. Econ. Goldschmidt, D., Klosterhuber, W., Schmieder, J.F.: Identifying couples in Hist. 77, 101342 (2020) administrative data. J. Labour Market Res. 50(1), 29–43 (2017) Ahlfeldt, G.M., Redding, S.J., Sturm, D.M., Wolf, N.: The economics of density: Goodchild, M.F.: The quality of big (geo) data. Dialogues Hum. Geogr. 3(3), evidence from the Berlin Wall. Econometrica 83(6), 2127–2189 (2015) 280–284 (2013) Arntz, M.: The geographical mobility of unemployed workers. ZEW-Centre for Graham, B.S.: Identifying and estimating neighborhood effects. J. Econ. Lit. European Economic Research Discussion Paper 05-034 (2005) 56(2), 450–500 (2018) Bähr, S., Haas, G.-C., Keusch, F., Kreuter, F., Trappmann, M.: IAB-SMART-Studie: Haller, P., Heuermann, D.F.: Opportunities and competition in thick labor mar- Mit dem Smartphone den Arbeitsmarkt erforschen. In IAB-Forum: Das kets: evidence from plant closures. J. Reg. Sci. 60(2), 273–295 (2020) neue Onlinemagazin des Instituts für Arbeitsmarkt-und Berufsforschung, Helsley, R.W.: Urban political economics. In: Henderson, J.V., Thisse, J.-F. (eds.) pp. 09–01. IAB (2018) Handbook of Regional and Urban Economics, vol. 4, Chapter 54, pp. Bayer, P., Fang, H., McMillan, R.: Separate when equal? Racial inequality and 2381–2421. Elsevier, Amsterdam (2004) residential segregation. J. Urban Econ. 82, 32–48 (2014) Henderson, J.V., Storeygard, A., Weil, D.N.: Measuring economic growth from Bayer, P., Ross, S.L., Topa, G.: Place of work and place of residence: informal outer space. Am. Econ. Rev. 102(2), 994–1028 (2012) hiring networks and labor market outcomes. J. Polit. Econ. 116(6), Jacobebbinghaus, P., Seth, S.: The German integrated employment biogra- 1150–1196 (2008) phies sample IEBS. Schmollers Jahrbuch 127(2), 335–342 (2007) Brakman, S., Garretsen, H., Schramm, M.: The spatial distribution of wages: Jahn, E., Neugart, M.: Do neighbors help finding a job? Social networks and estimating the Helpman-Hanson model for Germany. J. Reg. Sci. 44(3), labor market outcomes after plant closures. Labour Econ. 65, 101825 437–466 (2004) (2020) Breidenbach, P., Cohen, J., Schaffner, S.: Continuation of air services at Berlin- Kang, Y., Zhang, F., Peng, W., Gao, S., Rao, J., Duarte, F., Ratti, C.: Understanding Tegel and its effects on apartment rental prices. Available at SSRN house price appreciation using multi-source big geo-data and machine 3840560 (2021) learning. Land Use Policy (Online first), 104919 (2020) Bügelmeyer, E., Schaffner, S., Schanne, N., Scholz, T.: Das DIW-IAB-RWI-Nach- Kennan, J., Walker, J.R.: The effect of expected income on individual migration barschaftspanel: Ein Scientific-Use-File mit lokalen Aggregatdaten und decisions. Econometrica 79(1), 211–251 (2011) dessen Verknüpfung mit dem deutschen Sozio-ökonomischen Panel. Kholodilin, K.A., Mense, A.: German cities to see further rises in housing prices RWI Materialien 97, RWI (2015) and rents in 2013. DIW Econ. Bull. 2(12), 16–26 (2012) Bundesagentur für Arbeit: Blickpunkt Arbeitsmarkt: Monatsbericht zum Arbe- Kremer, M.: How much does sorting increase inequality? Q. J. Econ. 112(1), its- und Ausbildungsmarkt. https:// www. arbei tsage ntur. de/ datei/ ba146 115–139 (1997) 273. pdf (2020) Lee, B.A., Oropesa, R.S., Kanan, J.W.: Neighborhood context and residential Card, D.: Using geographic variation in college proximity to estimate the mobility. Demography 31(2), 249–270 (1994) return to schooling. NBER Working Paper 4483 (1993) Lee, B.A., Reardon, S.F., Firebaugh, G., Farrell, C.R., Matthews, S.A., O’Sullivan, D.: Card, D., Heining, J., Kline, P.: Workplace heterogeneity and the rise of West Ger- Beyond the census tract: patterns and determinants of racial segregation man wage inequality. Q. J. Econ.128(3), 967–1015 (2013) at multiple geographic scales. Am. Sociol. Rev. 73(5), 766–791 (2008) Card, D., Mas, A., Rothstein, J.: Tipping and the dynamics of segregation. Q. J. Legewie, J., Schaeffer, M.: Contested boundaries: explaining where ethnoracial Econ. 123(1), 177–218 (2008) diversity provokes neighborhood conflict. Am. J. Sociol. 122(1), 125–161 Chetty, R., Hendren, N.: The impacts of neighborhoods on intergenerational (2016) mobility I: childhood exposure effects. Q. J. Econ. 133(3), 1107–1162 Lucas, R.E., Rossi-Hansberg, E.: On the internal structure of cities. Econometrica (2018) 70(4), 1445–1476 (2002) Combes, P.-P., Duranton, G., Gobillon, L.: Spatial wage disparities: sorting mat- Mossay, P., Picard, P.: Spatial segregation and urban structure. J. Reg. Sci. 59(3), ters! J. Urban Econ. 63(2), 723–742 (2008) 480–507 (2019) Currie, J., DellaVigna, S., Moretti, E., Pathania, V.: The effect of fast food restau- Oakes, J.M., Andrade, K.E., Biyoow, I.M., Cowan, L.T.: Twenty years of neighbor- rants on obesity and weight gain. Am. Econ. J. Econ. Policy 2(3), 32–63 hood effect research: an assessment. Curr. Epidemiol. Rep. 2(1), 80–87 (2010) (2015) Cutler, D.M., Glaeser, E.L.: Are ghettos good or bad? Q. J. Econ. 112(3), 827–872 Ottaviano, G., Thisse, J.-F.: Agglomeration and economic geography. In: Hen- (1997) derson, J.V., Thisse, J.-F. (eds.) Handbook of Regional and Urban Econom- Dauth, W.,Haller, P.: Berufliches Pendeln zwischen Wohn- und Arbeitsort: Klarer ics, vol. 4, Chapter 58, pp. 2563–2608. Elsevier, Amsterdam (2004) Trend zu längeren Pendeldistanzen. IAB-Kurzbericht 10/2018 (2018) Reardon, S.F., O’Sullivan, D.: Measures of spatial segregation. Sociol. Methodol. Dauth, W., Haller, P.: Is there loss aversion in the trade-off between wages and 34(1), 121–162 (2004) commuting distances? Reg. Sci. Urban Econ. 83, 103527 (2020) Reichelt, M., Abraham, M.: Occupational and regional mobility as substitutes: a Desmet, K., Henderson, J.V.: The geography of development within countries. new approach to understanding job changes and wage inequality. Soc. In: Duranton, G., Henderson, V., Strange, W. (eds.) Handbook of Regional Forces 95(4), 1399–1426 (2017) and Urban Economics, vol. 5, pp. 1457–1517. North-Holland, Amsterdam Rosenthal, S.S., Strange, W.C.: The attenuation of human capital spillovers. J. (2015) Urban Econ. 64(2), 373–389 (2008) Duranton, G., Henderson, V., Strange, W.: Handbook of Regional and Urban Rossi-Hansberg, E., Sarte, P.-D., Owens, R., III.: Housing externalities. J. Polit. Economics, vol. 5A. North-Holland, Amsterdam (2015) Econ. 118(3), 485–535 (2010) Duranton, G., Puga, D.: Urban land use. In: Duranton, G., Henderson, V., Rüttenauer, T.: Neighbours matter: a nation-wide small-area assessment of Strange, W. (eds.) Handbook of Regional and Urban Economics, vol. 5, pp. environmental inequality in Germany. Soc. Sci. Res. 70, 198–211 (2018) 467–560. North-Holland, Amsterdam (2015) Schelling, T.C.: Models of segregation. Am. Econ. Rev. 59(2), 488–493 (1969) Durlauf, S. N.: Neighborhood effects. In J. V. Henderson and J.-F. Thisse (eds.), Schelling, T.C.: Dynamic models of segregation. J. Math. Sociol. 1(2), 143–186 Handbook of Regional and Urban Economics, vol. 4, Chapter 50, pp. (1971) 2173–2242. Amsterdam: North-Holland (2004) Geodata in labor market research: trends, potentials and perspectives Page 17 of 17 5 Scholz, T., Rauscher, C., Reiher, J., Bachteler, T.: Geocoding of German admin- istrative data: the case of the Institute for Employment Research. FDZ- Methodenbericht 9 (2012) Schönwälder, K., Söhn, J.: Immigrant settlement structures in Germany: general patterns and urban levels of concentration of major groups. Urban Stud 46(7), 1439–1460 (2009) Sharkey, P., Faber, J.W.: Where, when, why, and for whom do residential contexts matter? Moving away from the dichotomous understanding of neighborhood effects. Annu. Rev. Sociol. 40, 559–579 (2014) Sorenson, O., Dahl, M.S.: Geography, joint choices, and the reproduction of gender inequality. Am. Sociol. Rev. 81(5), 900–920 (2016) Statistisches Bundesamt. Alle politisch selbständigen Gemeinden mit aus- gewählten Merkmalen am 30.09.2019 (3. Quartal 2019) (2019). https:// www. desta tis. de/ DE/ Themen/ Laend er- Regio nen/ Regio nales/ Gemei ndeve rzeic hnis/ Admin istra tiv/ Archiv/ GVAus zugQ/ Auszu gGV3Q Aktue ll. html Vom Berge, P., Schanne, N., Schild, C.-J., Trübswetter, P., Wurdack, A., Petrovic, A.: Eine räumliche Analyse für Deutschland: Wie sich Menschen mit niedri- gen Löhnen in Großstädten verteilen. IAB-Kurzbericht 12/2014 (2014) Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in pub- lished maps and institutional affiliations. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal for Labour Market Research Springer Journals

Geodata in labor market research: trends, potentials and perspectives

Loading next page...
 
/lp/springer-journals/geodata-in-labor-market-research-trends-potentials-and-perspectives-cZIjVbYb6j
Publisher
Springer Journals
Copyright
Copyright © The Author(s) 2022. corrected publication 2022
ISSN
1614-3485
eISSN
2510-5027
DOI
10.1186/s12651-022-00310-x
Publisher site
See Article on Publisher Site

Abstract

This article shows the potentials of georeferenced data for labor market research. We review developments in the lit- erature and highlight areas that can benefit from exploiting georeferenced data. Moreover, we share our experiences in geocoding administrative employment data including wage and socioeconomic information of almost the entire German workforce between 2000 and 2017. To make the data easily accessible for research, we create 1-square-kilom- eter grid cells aggregating a rich set of labor market characteristics and sociodemographics of unprecedented spatial precision. These unique data provide detailed insights into inner-city distributions for all German cities with more than 100,000 inhabitants. Accordingly, we provide an extensive series of maps in the Additional file 1 and describe Berlin and Munich in greater detail. The small-scale maps reveal substantial differences in various labor market aspects within and across cities. Keywords: Georeferenced data, Microdata, Register-based data, Urban economics, Regional science, Labor economics, Neighborhood effects, Spatial economics, Segregation JEL classification: J12, J31, R12, O18 characteristic of geodata is the assignment of each sta- 1 Introduction tistical identity to an exact location on the Earth’s sur- Today, individual geopositioning is ubiquitous. We use face (Goodchild 2013). Currently, most spatial research detailed georeferenced data (henceforth: geodata) to in economics and sociology uses city district  or county navigate driving routes, track after-work runs, and look aggregates. However, spatially aggregated data face sev- up directions to a new restaurant. Companies profit from eral limitations restricting the investigation of many optimized logistics, agriculture and construction due research questions. In contrast, geodata allow to flexibly to detailed information from orbital satellite systems. scale spatial information independently of administrative Whereas processing and utilizing detailed position data boundaries, resulting in three main advantages: are common in many fields such as engineering and busi - First, greater spatial depth enables the detailed inves- ness administration, these skills have not been a primary tigation of topics such as segregation (Brakman et  al. subject in economics and sociology yet. 2004; Eeckhout et al. 2014; Rosenthal and Strange 2008), This article examines the potential of geodata in the neighborhood effects (Schönwälder and Söhn 2009) and social sciences. Moreover, the article presents multi- mobility (Dauth and Haller 2018, 2020). Second, geo- city evidence on how small-scale geodata can reveal data can serve as a methodological tool. For instance, inner-city developments and inequalities that have been researchers can use geodata for the sampling of surveys hidden by administrative borders so far. The essential or identifying neighborhood boundaries (Lee et al. 2008; Legewie and Schaeffer 2016), spatial shocks or family *Correspondence: kerstin.ostermann@iab.de relations (Goldschmidt et  al. 2017). Third, the potential Friedrich-Alexander University Erlangen-Nürnberg (FAU), Findelgasse 7-9, of enriching existing data with geoinformation opens 90402 Nürnberg, Germany up possibilities for record linkage, e.g., with smartphone Full list of author information is available at the end of the article © The Author(s) 2022, corrected publication 2022. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visithttp:// creat iveco mmons. org/ licen ses/ by/4. 0/. 5 Page 2 of 17 K. Ostermann et al. data (Bähr et  al. 2018) as well as with genuine spatial 2015; Gathmann et  al. 2020). Natural experiments are data, such as satellite imagery (Henderson et  al. 2012) of special interest for labor market research because and climate data (Rüttenauer 2018). they allow to rule out spatial sorting (Combes et  al. This is likely due to the lack of data and the complex - 2008; Haller and Heuermann 2020). Geodata enable ity of processing them (Bayer et al. 2014; Vom Berge et al. researchers to precisely evaluate the effect of regional 2014; Bügelmeyer et al. 2015). However, increasing com- shocks on individuals, subgroups, or entire local labor putational capacities and more suitable statistical tools markets (Desmet and Henderson 2015; Oakes et  al. facilitate research on and with geodata. As a result, the 2015) with much higher precision than regional aggre- number of published studies using geo-data has been gates. One example for such an exogenous shock in rapidly growing and will further increase given the vari- Germany is the refugee inflow in 2015 and 2016. Using ety of advantages geodata offers. geodata, researchers can track refugee residences and In this article, we highlight research potentials of workplaces within cities and can evaluate the integra- geocoded labor market data with descriptive evidence tion process in a more detailed way than with regional from grid cell data as an example. Moreover, we share aggregates. Moreover, flexible scaling enhances the our experience in geocoding the employment biogra- selection of appropriate control regions for matching phies of almost the entire German workforce between processes. 2000 and 2017. In addition to detailed daily information As a further large-scale topic, geodata contribute on employment and unemployment records, the data to insights for city and infrastructure planning which contain exact coordinates of workplaces and places of is connected to the locational choice for institutions, residence. This allows us to describe the German labor firms and workers (Duranton et  al. 2015; Helsley 2004; market with unprecedented spatial precision. Further- Ottaviano and Thisse 2004). To capture metropoli - more, this paper illustrates the potential of geodata by tan effects, Lucas and Rossi-Hansberg (2002) propose visualizing the labor market characteristics of all major an equilibrium city model, which operates under the German cities, of which two, Berlin and Munich, will assumption that people live where they work. Using be discussed in greater detail. We show that small-scale geodata, Dauth and Haller (2018) show that this geodata can reveal substantial differences in fundamental assumption is—at least for Germany—only partially labor market characteristics within and across cities. true. While US cities are mostly monocentric with clear This article is organized as follows: In Sect.  2, we districts for firms, workers and different employment review the recent literature, focusing on research that groups, cities in other, e.g., European, countries might already uses or could benefit from using geodata. Next, be structured differently, which makes it difficult to link in Sect. 3, we share our experiences in geocoding admin- them to existing theoretical and empirical models (Ahl- istrative labor market data. In Sect.  4, we provide small- feldt et al. 2015; Dauth and Haller 2020; Duranton and scale descriptions of two large German cities, Berlin and Puga 2015). Tackling this issue, Ahlfeldt et  al. (2015) Munich. In the final section, we conclude by identifying use geodata in a quantitative theoretical model to esti- potential research areas and questions for the presented mate the dynamics of the internal city structure with data set. Additionally, an extensive online appendix that heterogeneous centers. They build city “blocks” of 500 contains fine-graded maps of labor market characteris - square meter grid cells (“grids”) to control for variation tics for all German cities with more than 100,000 inhabit- in the surroundings. In a second step, they combine ants complements this article. their theoretical model with the natural experiment of the fall of the Berlin Wall and use inner-city variation across grids to provide causal evidence. 2 Potential research topics and trends In addition to regional and city-related topics, geodata in the relevant literature offer advantages on a smaller scale, enabling the detailed In the following section, we provide a short overview of analysis of neighborhood effects. Although the concept potential research fields, starting with questions covering of neighborhoods is quite diverse, research generally dis- larger regional areas and cities before moving towards tinguishes between residential and workplace neighbor- research on neighborhoods and individual mobility. hoods. Although research on workplace neighborhoods Although we present each topic separately, there are vari- can considerably profit from the usage of geodata, we ous dependencies across these research fields. will focus on the research potentials for the literature on One of the most popular approaches to derive causal residential neighborhoods in this article. For the choice inference are “natural experiments” such as political of residence, contextual factors such as the social context, reforms, mass layoffs or sudden economic or natu - quality of life, public goods, and housing costs play an ral developments affecting entire regions (Ager et  al. important role (Dustmann et  al. 2018; Kang et  al. 2020; 2020; Ahlfeldt et  al. 2015; Desmet and Henderson Geodata in labor market research: trends, potentials and perspectives Page 3 of 17 5 Lee et al. 1994). Highlighting the relevance of social net- effects. As indicated in the beginning of this section, works, Jahn and Neugart (2020) find significant job refer - exploiting exogenous events is a popular strategy to ral networks in German neighborhoods using geocoded account for endogenous neighborhood change (Chetty data. and Hendren 2018; Rossi-Hansberg et al. 2010). However, A prominent strand within neighborhood literature such events are rare and often identify local average treat- is the rise and development of segregation (Mossay and ment effects only. The geographically small scale of grid Picard 2019; Reardon and O’Sullivan 2004). Segregated or point data enables other causal estimation techniques subgroups can arise if characteristics are homogeneous based on border distances or grid-cell variation. Exempli- within neighborhoods but heterogeneous between neigh- fying the potential of small-scale data, Bayer et al. (2008) borhoods (Bayer et  al. 2014; Cutler and Glaeser 1997; use block-level variation within a wider neighborhood Graham 2018; Legewie and Schaeffer 2016; Schelling to estimate the causal effect of neighborhood referrals. 1969). Small-scale geodata like grid cells provide a higher Geocoded grid-cell data can easily improve their admin- resolution for segregation patterns and their effects than istrative block approach. Another example is the paper of county- or district-level data enabling not only a more Breidenbach et  al. (2021), who use Berlin grid-cell data fine-grained investigation on the base of grid cells but to estimate the causal effect of flight noise and proxim - also comparisons between grid cells. For instance, vom ity to the airport on housing rental prices. In exploiting Berge et al. (2014) use cross-sectional geocoded German the unexpected delays of the airport closure of Berlin- employment data to visualize the distribution of low- Tegel and inner-city variation in the exposure to flight income individuals for the German cities Berlin, Ham- noise, they show that flight noise reduces rental prices of burg and Munich. Although providing only a snapshot treated neighborhoods by 2 to 5%. for one year, vom Berge et al. (2014) already highlight that Moreover, geodata measure the effects of geographi - Munich and Berlin differ in their segregation patterns. cal distances more precisely than aggregates at higher To investigate the rise of segregation, research can- administrative levels, thereby enhancing the analysis of not solely focus on a static definition of neighborhoods. individual mobility. Although the focus of this article is Neighborhoods are dynamic environments that change grid-cell data and inner-city distributions, individual and evolve over time due to exogenous events or selec- mobility is a field of research with a high potential in the tive individual mobility (Feijten and Van  Ham 2009; usage of geo-data. A broad body of research literature Sharkey and Faber 2014). In general, similar individuals seeks to explain individual (non-)mobility (Arntz 2005; tend to choose neighborhoods with similar character- Chetty and Hendren 2018; Kennan and Walker 2011; istics to their own (Durlauf 2004; Feijten and Van  Ham Lee et  al. 1994; Reichelt and Abraham 2017; Sorenson 2009; Kremer 1997). Summing this selective residential and Dahl 2016) and commuting (Dauth and Haller 2020). choice up to a selective subgroup inflow on the aggre - However, most of these analyses measure regional mobil- gate level, neighborhoods might “tip”: The emerging ity as moving from one county or region to another, subgroup drives minorities out of the neighborhood, resulting in a bias for individuals living close to a bor- causing endogenous mobility and segregation (Durlauf der or moving within a district (Lee et  al. 1994). Using 2004; Schelling 1969, 1971). Such segregated neighbor- geodata, mobility is now a continuous variable instead hoods can cause neighborhood conflicts, especially of a binary indicator that facilitates advanced estima- if neighborhood boundaries are contested (Legewie tion methods in mobility research (Dauth and Haller and Schaeffer 2016). For dynamic analyses of segrega - 2020). Currie et  al. (2010), e.g., show that the distance tion developments, trend- or panel-data are necessary. to fast food restaurants in miles correlates with the indi- The investigation of dynamic compositional changes vidual’s weight gain. Card (1993) uses college proximity is especially relevant for high-density neighborhoods as an instrument when examining the returns to school- where housing alternatives are rare, particularly under ing among young males in the US. Additionally, geo- the assumption that land and its users are heterogene- data researchers can either consider the initial position ous (Card et  al. 2008; Duranton and Puga 2015; Helsley within an administrative unit explicitly or can neglect it 2004). As tight living conditions are most evident for completely. larger cities, we focus on those in this article. Taken together, the review demonstrates that geo- In addition to promoting descriptive research on seg- data improve a wide range of possible research top- regation patterns and processes, geodata also offer new ics and methods. First, geodata enable a more precise possibilities for the causal estimation of neighborhood measurement of regional shocks and their effects. Sec - ond, geodata supersede the reliance on simplified city Even though we are going to present trend data in this article, the underly- or neighborhood models without relying on assump- ing grid cell data provide also information on neighborhood in- and outflows. tions about the distribution of productivity, income and 5 Page 4 of 17 K. Ostermann et al. socioeconomic characteristics within districts. Third, result in lines, and multiple lines lead to a geometric geodata enhance mobility research opening up a new object called a polygon. The latter can be an administra - scope of social science research. tive unit on which data are spatially aggregated. However, independence from these administrative units is the most 3 A case study of geocoding striking asset of geodata. Therefore, the final geocoded Even though some studies already use grid cell data to IEB store point data. investigate city developments, neighborhood composi- In previous years, the Institute for Employment tion or individual mobility (Ahlfeldt et al. 2015; Jahn and Research (IAB) gained some experience with geocoding Neugart 2020; Vom  Berge et  al. 2014), there is no avail- data sets: The first attempt was a sample of three due able data set containing longitudinal and comprehensive dates in 2007 to 2009 (Scholz et  al. 2012), followed by labor market information on grid-cell level for a whole the processing of the address histories of establishments, country as Germany. To provide such a data set, we geo- employees, and clients of job centers for the years 2000- coded administrative labor market data from Germany. 2014 (Dauth and Haller 2018). The last reviewed version In the following, we will shortly describe the characteris- from 2019 contains the years 2000 to 2017 and all avail- tics of the Integrated Employment Biographies (IEB), the able address histories, called IEB GEO. This data set is base of the data set used. Moreover, we give insight into a supplement to the IEB as well as to all other IAB data the process of geocoding these particular data. sets and samples that are connected to the register data, such as the IAB Establishment Panel (EP) , the IAB Job 3.1 I ntroduction to German administrative labor market Vacancy Survey (JVS) , the Panel Study “Labour Market data and Social Security” (PASS) and the IAB-BAMF-SOEP The IEB contain register-based information about indi - Survey of Refugees . viduals who are employed (data available since 1975) The IAB met several challenges to improve the future or receive benefits according to the German Social quality of references and shorten production time before Code (SGB). The IEB further include data of individuals the addresses of the IEB can be transformed to geoco- searching for a job or receiving vocational guidance (data des: One main challenge is that some addresses change available since 2000) as clients of the German Federal over time because of new postcodes and new names of Employment Agency (BA) or the local job centers. The municipalities or streets. The used geocoding tool from IEB also contain information on individuals participating infas360 refers to one single timestamp, in this case, to in programs of active labor market policies (data available the end of 2017. Therefore, some historical information since 2000). do not match the new notation, leading to inexact geo- The spatial information in the base IEB was limited references. In this case, we use technical links provided to separate units of municipalities and areas referring by the statistical Datawarehouse of the IAB. Usually, the to administrative offices (“Arbeitsagenturen”) or local Datawarehouse processes addresses into an identifier of job centers. These units are not constant and underlie a spatial unit, which is the common area of the postcode, continuous changes due to fusions of political units or community, Federal Employment Agency, and job center new layouts of local labor markets. Since the late 1990s, (statistical place identifier) . If the units or unit names the IEB include not only the workplace or the agency change, the linking document changes from an address that delivers benefits but also the residence of the indi - to another statistical place or official name over time. viduals or the benefit units (“Bedarfsgemeinschaften”). Using this database, we add the new address notations for Since 2000, this information has been based on mailing postcodes and names of municipalities to the pool of all addresses. Time stamps are exact to the day when a new address is registered. https:// www. iab. de/ en/ erheb ungen/ iab- betri ebspa nel. aspx. 3.2 Geocoding https:// www. iab. de/ en/ befra gungen/ stell enang ebot. aspx. In the following, we describe the process used to trans- https:// fdz. iab. de/ en/ FDZ_ Indiv idual_ Data/ PASS. aspx. form mail-exact address data from the IEB into geodata. 6 https:// fdz. iab. de/ en/ FDZ_ Indiv idual_ Data/ iab- bamf- soep/ IAB- BAMF- The characteristic feature of geodata is the efficient stor -SOEP- SUF16 17v1. aspx. age of address information in points, lines or polygons. https:// www. infas 360. de/ geoko dieru ng/. The statistical department of the Federal Employment Agency provides Each point contains two dimensions: the longitude on an overview of the different regional classifications https:// stati stik. arbei the x-axis and the latitude on the y-axis. Various points tsage ntur. de/ DE/ Navig ation/ Grund lagen/ Klass ifika tionen/ Regio naleG liede rungen/ Regio naleG liede rungen- Nav. html.; see especially the combinations https://statistik.arbeitsagentur.de/DE/Statischer-Content/Grundlagen/Klas- sifikationen/Regionale-Gliederungen/Generische-Publikationen/Zusam- For more detailed information, see Jacobebbinghaus and Seth (2007). menhang-Gebietsgliederungen.xlsx?__blob=publicationFile&v=4. Geodata in labor market research: trends, potentials and perspectives Page 5 of 17 5 historical addresses. For streets, no links were available not publicly available. Address information in connection until now, so gaps in the exact geocodes remain in this with any social security information is highly secured case. and only available to the geocoding team. The juridical Another issue is the implementation of address histo- department of the IAB grants restricted access to IAB ries at different times with different standards. To solve staff after a detailed description of the project. The IAB this issue, we create a unique format that conforms with follows strict data protection measures as a matter of the geocoder tool and separates the house number from course. the street name. The geocoding tool is less successful To meet the data protection guidelines, we designed in the case of several house numbers for one address the IEB GEO as a system of several data sets with differ - (which is quite common for addresses of establishments), ent sensitivity and access modes: The five histories con- prompting the use of only the first number (e.g., instead tain only an anonymous Geo-ID along with anonymized of “Hauptstraße 100–104”, we refer to “Hauptstraße, identifiers of persons, establishments or SGB-II-benefit 100”). Therefore, the coding quality for these addresses units, begin-/enddate with some variables describing the is less exact but without any missing house number quality and two markers of moves between addresses. information. Especially in the first years of the address A second data set contains information on the relation histories, the address notation is poor due to shorten- between the point-ID and six available anonymous grid- 2 2 2 2 ing, typing or transmission errors. Therefore, we replace cell-IDs 100 m (100 m , 500 m , and 1000 m -grids in common or known notations with new standards. We Lambert projection (LAEA) and Universal Transversal also detect anonymous addresses such as lock boxes or Mercator-Projection Zone 32 (UTM32)). Seven sepa- refuges for battered women and set them to “missing” to rated data sets contain the official codes and two addi - protect secure personal information. tional projection systems (Gauß-Krüger-Projection and To georeference the addresses, we use the commercial World Geodetic System 1984), and the last data set links tool of infas360. Unfortunately, the matching algorithms the identifiers of the IEB to those of the IEB GEO. are business secrets and are therefore not available for To comply with the GDPR, the design of the IEB GEO scientific documentation or for developing another data is available at different levels of anonymization according preparation process. However, we derive some major to the scientific purpose. For some analyses, anonymous principles and adjusted the processing accordingly. For geogrid identifier are sufficient. In other cases, users can example, the geocode quality is worse in some cases if compute distances with remote data access. If necessary, postcode and municipality name do not match. There - users have to apply for geocodes or grid codes in different fore, we geocode cases with minor results a second time granularities to combine the IEB GEO with other geodata without the postcode and include the geocode with the or points of interest or, as in the example below, to pro- best quality. When the tool returns two codes belonging duce maps of labor market characteristics in 1 × 1 kilo- to different municipalities, we exclude these cases from meter grid cells illustrating the labor market structure of further processing. cities. 3.3 IEB GEO 4 Results: labor market characteristics of selected In total, the address histories used include 420 million cities data rows with approximately 80 million different address Having explained our experiences with geocoding notations. We pool these data as 43 million standardized social security data, the following section shows labor notations with the geocoder tool returning 19 million market insights and developments on a fine scale ena - geocodes. To keep the processing time manageable, we bling analyses within and irrespective to administrative used two georeferencing processes in parallel. One geoc- boundaries. We illustrate the potential of such data by oding passage ultimately lasted three days. The different investigating various inner-city labor market character- measures of standardization therefore not only improved istics. Based on a series of maps, we describe the spatial the data quality but also shortened the workflow. The distribution of workplaces, residencies, wages, employ- quality of georeferences differs among the sources and ment types, and skills. All maps are based on the full increases over time. On average, approximately 95% of the geocodes are exact mailing addresses, making a strong base for further analyses. As a variable of register data, the exact workplace or Referring to (a) the place of establishments, place of residence of (b) employ- residence is highly sensitive information in terms of the ees, (c) clients of the Federal Employment Agency and d) job center-clients of authorized municipalities that deliver data via the transmission standard German General Data Protection Regulation (GDPR). XSozial-BA-SGB II, and the place of residence of e) benefit units following §7 Due to the high sensitivity of the data, the IEB GEO is SGB II. 5 Page 6 of 17 K. Ostermann et al. Fig. 1 Employment density. The figure shows the number of workers in 1 × 1 kilometer grid cells in Berlin (upper panel, 759 grids) and Munich (bottom panel, 289 grids) in 2000, 2010 and 2017. Light purple cells indicate a low number of workers, and dark purple cells indicate a high number. We fixed the color scale for each feature so that it approximately ranges from the first to the ninth decile in all cities with more than 100,000 inhabitants. The data base of the maps is social security data from the IAB, even though we exclude chain-store industries from the workplace data. For data protection reasons, we removed cells with fewer than 20 residents. Blue areas in the background represent water; green areas, forests; light yellow areas, settlements; solid gray lines, roads; and dashed gray lines, railroads IEB GEO and visualize the distribution of labor market 4.1 Employment and residential density characteristics in 1 × 1 kilometer grid cells. Figures 1 and 2 illustrate the employment and residential For data protection reasons, we censored cells with density in Berlin and Munich. To measure employment fewer than 20 residents or, in case of the employment density, we count all workers in their workplace grid cell. density, with fewer than four establishments. We refer German firms have to register at least one of their estab - readers to the extensive online supplement, which con- lishments per municipality and industry by law, which tains more than 2000 maps for all German cities with makes workplace information highly reliable in general. over 100,000 inhabitants. These maps show that many However, firms that operate several establishments in a German cities differ substantially in their shape from a municipality within the same industry are only obliged to monocentric city structure. The general shape of Düs - register one of them. In such cases, it cannot be guaran- seldorf, for instance, (pp. 53–55), follows the form of a teed that individuals work in the grid they are registered. left-faced arc, whereas the shape of Bremen (pp. 29–31) To prevent errors, we follow Dauth and Haller (2020) and follows the large river Weser from east to west. How- exclude the following chain-store industries from the ever, this study focuses on two of the largest cities in workplace data: construction, financial intermediation, Germany: Berlin and Munich. These cities are interest - public service, retail trade, temporary agency work and ing subjects because they exhibit diametrically different transportation. The exclusion of chain store industries histories and infrastructure. leads to slightly underestimated employment densities. Geodata in labor market research: trends, potentials and perspectives Page 7 of 17 5 Fig. 2 Residential density. The figure illustrates the number of residents in 1 × 1 kilometer grid cells in (upper panel, 759 grids) and Munich (bottom panel, 289 grids) in 2000, 2010 and 2017 Light purple cells indicate a low number of residents, and dark purple cells indicate a high number. We fixed the color scale for each feature so that it approximately ranges from the first to the ninth decile in all cities with more than 100,000 inhabitants. The data base of the maps is social security data from the IAB. For data protection reasons, we removed cells with fewer than 20 residents. Blue areas in the background represent water; green areas, forests; light yellow areas, settlements; solid gray lines, roads; and dashed gray lines, railroads The map for Berlin (Fig.  1, upper panel) indicates a about self-employed individuals, civil servants, stu- loose employment agglomeration towards the city dents, retirees, pure homemakers or children. center in 2017. However, some extensions reach out Figure  2 shows the residential density in the two cit- towards the peripheries highlighting the importance ies. The distribution of residents is scattered over the of alternative agglomeration models like the model of different districts of Berlin, creating a multicentric city - Ahlfeldt et  al. (2015). Employment density has grown scape. While still appearing slightly more concentrated over the years in Berlin and shifted from a slight ten- in the west, the population density shifted, similar to the dency to the west towards the city center. employment density, towards the geographical center of In the bottom panel of Fig. 1, the employment density Berlin over time. in Munich shows an increasing agglomeration towards In Munich, the population density is slightly more con- the city center. The few extensions in certain regions centrated in the southern part of the city. It shows steady around the city might be caused by plants of large firms growth, exceeding the threshold of 3000 inhabitants in around the belt of Munich. most of the grids in 2017. This high density confirms pre - To measure the residential density, we counted all vious findings, which show that Munich is the city with individuals in their grid of residence. Due to the ori- the highest population density in Germany (Statistisches gin of the data, the data only include individuals in the Bundesamt 2019). German social security system, such as employees, reg- In both of the displayed cities, the employment den- istered unemployed individuals, individuals in labor sity shows a radiating pattern that is likely to correlate market programs, and recipients of unemployment with the main transportation routes of each city. The benefits. Therefore, the data do not provide information residential density seems to be more centered in Munich, 5 Page 8 of 17 K. Ostermann et al. Fig. 3 Median daily wage. The figure presents the median daily wage in 1 × 1 kilometer grid cells in Berlin (upper panel, 759 grids) and Munich (bottom panel, 289 grids) in 2000, 2010 and 2017. Light purple cells indicate low levels of the median daily wage, and dark purple cells indicate high levels. We fixed the color scale for each feature so that it approximately ranges from the first to the ninth decile in all cities with more than 100,000 inhabitants. The data base of the maps is social security data from the IAB. For data protection reasons, we removed cells with fewer than 20 residents. Blue areas in the background represent water; green areas, forests; light yellow areas, settlements; solid gray lines, roads; and dashed gray lines, railroads whereas Berlin is more multicentric, showing diversity wage illustrate between-neighborhood inequality and the in districts. Additionally, there seems to be a agglomera- Gini coefficient visualizes within-neighborhood inequal - tion trend over time in employment as well as residential ity. If all wages within a grid cell were equal, the Gini density. coefficient would be zero. If one inhabitant earns all, the Gini would be equal to 1. The wage information in the 4.2 Wages register data is highly reliable in general because employ- Figures 3 and 4 show the median daily wages of residents ers are legally obliged to report wages. However, as typi- and the Gini coefficients in Berlin and Munich. We use cal for social security data, earnings are right-censored at both variables as measures for wage segregation and ine- the social security threshold, which affects approximately quality in neighborhoods. The maps for the median daily 10% of the German workforce. We impute top-coded wages using a two-stage procedure similar to Dustmann et  al. (2009) and Card et  al. (2013) before computing median wages and Gini coefficients. Munich and Berlin are only examples of German cities. Other cities show The concentration of high wages in Berlin (Fig.  3, upper different unusual patterns. For example, the density of residents in Dresden panel) is even more multicentric than the distribution of (maps on pp. 47–49 in the online appendix) is shaped as two diagonal lines employment and residential density. In 2017, multiple across the River Elbe rather than a clear city center concentration, giving geo- graphical conditions a decisive role. Geodata in labor market research: trends, potentials and perspectives Page 9 of 17 5 Fig. 4 Gini coefficient of daily wage. The figure shows the Gini coefficient of daily wages in 1 × 1 kilometer grid cells in Berlin (upper panel, 759 grids) and Munich (bottom panel, 289 grids) in 2000, 2010 and 2017. Cells with light purple color indicate low Gini, and dark purple cells indicate high Gini. The color scale is fixed for each feature and approximately ranges from the first to the ninth decile in all cities with more than 100,000 inhabitants. The data base of the maps is social security data from the IAB. For data protection reasons, we removed cells with fewer than 20 residents. Blue areas in the background represent water, green areas forests, light-yellow areas settlements, solid gray lines roads and dashed gray lines railroads high-wage centers spread across the north, southwest, within neighborhoods in 2010 and 2017, with a Gini of southeast and the center of Berlin. The median wage over 0.45. This was not always the case: the prevalent is the highest and most equally spread in 2000 before segregation occurred sometime between 2000 and 2010, declining and agglomerating over time with no clear vis- with a sharp incline in inequality between the former ually detectable pattern. Adding a dynamic perspective West and the former GDR. This pattern and develop - to the cross-sectional findings of vom Berge et al. (2014), ment can have several reasons, ranging from political we do see an increasing income segregation within larger (the major social security reform in 2005) or economic neighborhood clusters across the city since 2010. reasons (global finance crisis in 2008) to segregation Munich (Fig.  3, bottom panel) has a persistently high processes and private infrastructure investments. As the level in the median wages. Slightly smaller median wages maps on low-paid workers of vom Berge et al. (2014) do are only temporarily evident for 2010. However, the only not show such a sharp division along the former border small percentage of lower median income grids on the in 2009, the inner-city distribution of low-paid workers periphery in 2017 indicates that the city had recovered does not solely drive this pattern. In fact, the relation of from this situation. low-paid workers to high-paid workers seems to differ The Gini coefficient draws a completely different pic - systematically between the former West and the former ture (Fig.  4). In the maps of Berlin (upper panel), the GDR. A comparison with other German cities of the for- city is clearly divided along the former border of the mer GDR indicates that the low Gini coefficient in East West and the German Democratic Republic (GDR), with Berlin in 2017 might be a feature of East German cities: the western part showing noticeably higher inequality Although, e.g., Chemnitz (p. 36 in the Additional file  1), 5 Page 10 of 17 K. Ostermann et al. Fig. 5 Share of regular employed among all employed. The figure depicts the share of regularly employed workers among all workers in 1 × 1 kilometer grid cells in Berlin (upper panel, 759 grids) and Munich (bottom panel, 289 grids) in 2000, 2010 and 2017. Light purple cells indicate low shares of regular employed, and dark purple cells indicate high shares. We fixed the color scale for each feature so that it approximately ranges from the first to the ninth decile in all cities with more than 100,000 inhabitants. The data base of the maps is social security data from the IAB. For data protection reasons, we removed cells with fewer than 20 residents. Blue areas in the background represent water; green areas, forests; light yellow areas, settlements; solid gray lines, roads; and dashed gray lines, railroads Dresden (p.48 in the Additional file  1), Leipzig (p. 132 in strongly between the two cities. Berlin has little inequal- the Additional file  1) and Magdeburg (p.144 in the Addi- ity within neighborhoods in a large part of the city and tional file  1) show a slightly higher Gini coefficients than high inequality in the southwestern part, dividing the city East Berlin in 2010, the inequality within neighborhoods into two parts. In contrast, Munich has a high inequality is remarkably low in all  of those cities in 2017. As we across large parts of the city. Additionally, median wages are only providing visual and non-systematic evidence, are steadily high in Munich indicating low inequality future research should examine the potential reasons of between neighborhoods. Conversely, wages in Berlin are this specific pattern in East German cities more precisely distributed heterogeneously across the city, again creat- by using appropriate statistical models and a full observa- ing a multicentric picture of segregated neighborhood tion period of 18 years instead of 3-year snapshots. clusters. The comparison of the two cities stresses that Wage inequality in Munich follows the pattern of the inequality within and between neighborhoods can differ median wages, with increasing inequality from 2000 substantially from each other highlighting the impor- to 2010 and a slight recovery as of 2017 (Fig.  4, bottom tance of different measures and levels of segregation. panel). However, inequality within neighborhoods is, in contrast to the median wage distribution, higher in cer-4.3 Employment types tain parts of the city belt. This subsection sheds further light on employment and Although the wage inequality for both cities seems non-employment using the residential information to be highest in 2010 indicating a non-linear trend, the of the IEB GEO. Figure  5 depicts the share of regularly inner-city distribution of the wage inequality differs employed individuals who are subject to social insurance Geodata in labor market research: trends, potentials and perspectives Page 11 of 17 5 Fig. 6 Share of non-employed. The figure illustrates the share of unemployed individuals among all residents in 1 × 1 kilometer grid cells in Berlin (upper panel, 759 grids) and Munich (bottom panel, 289 grids) in 2000, 2010 and 2017. Light purple cells indicate low shares of unemployed, and dark purple cells indicate high shares. We fixed the color scale for each feature so that it approximately ranges from the first to the ninth decile in all cities with more than 100,000 inhabitants. The data base of the maps is social security data from the IAB. For data protection reasons, we removed cells with fewer than 20 residents. Blue areas in the background represent water; green areas, forests; light yellow areas, settlements; solid gray lines, roads; and dashed gray lines, railroads among all employed individuals in Berlin and Munich. exceptions. This image has not changed substantially in Figure  6 displays the share of non-working individuals recent decades other than a marginal decrease in 2010. (henceforth unemployed individuals) among all indi- The distribution of unemployment draws a differ - viduals in our data. We define unemployed individuals as ent picture (Fig.  6). Whereas the share of unemployed individuals who are registered unemployed, recipients of was generally high in 2000, it decreased in Berlin over social security benefits, or those who participate in labor the years. It is equally low across entire Berlin in 2017. market measurements and do not have a parallel employ- The same decrease in unemployed individuals applies ment spell. to Munich but at a different starting level. The share of In Berlin (Fig.  5, upper panel), the distribution of reg- unemployed individuals is overall low to nonexistent ularly employed individuals is relatively even in 2017. across the entire city and peripheries. However, the division between East and West Berlin is Employment development in both cities shows decreas- clearly visible, as the eastern area has a higher share of ing unemployment, which is in agreement with the regular employment. The segregation trend is also trace - nationally declining number of unemployed individuals able in the employment status: the equally distributed in Germany, especially since the social assistance (SGB share of regularly employed individuals in 2000 evolves II) reforms in 2005 (Bundesagentur für Arbeit 2020). The into a more segregated inner-city distribution in 2010 share of unemployed individuals in Berlin is higher than and 2017. that in Munich. In both cities, unemployment is almost In Munich (Fig.  5, bottom panel), regularly employed equally distributed, with a few exceptions of high-unem- individuals are equally distributed with only a few ployment grids. Whereas Berlin is more divided into two 5 Page 12 of 17 K. Ostermann et al. areas, the distribution of regular employment relation- “large cities disproportionately attract both high- and low ships in Munich appears to be more equal. skilled workers, while average skills are constant across city size”. The share of low-skilled workers is slightly 4.4 Skills higher and almost evenly distributed over the city, with A final series of maps illustrates the distribution of high-, a slightly higher concentration on the northeastern side. medium- and low-skilled residents in Berlin and Munich. The shares of medium- and low-skilled workers decline In the definition of skill levels, we follow the common over the years and are substituted by the increasing share classification in labor economics: low-skilled residents of high-skilled individuals. are individuals without vocational training, medium- What strikes attention is that in both cites, despite skilled residents are individuals who had completed their distinct differences in structure and centers, high- vocational training, and high-skilled residents are indi- and medium-skilled individuals are segregated. The resi - viduals with a degree from a university or university of dence choice of low-skilled individuals follows a different applied science. Figures 7 and 8 present the geographical pattern. We find a similar pattern of residence segrega - distribution of these three groups in Berlin and Munich tion by skill level for, e.g., Cologne (German “Köln”, pp. in 2000, 2010 and 2017. 125–127 in Additional file  1) and Leipzig (pp. 131–133 in Berlin (Fig.  7) shows a diverse distribution of skills at Additional file 1). first sight. A closer look reveals an agglomeration of high- Overall, Munich and Berlin differ from each other in skilled workers around the center and the southwestern various labor market characteristics. Berlin has a rather side of the city in 2017. In contrast, a lower share of high- multicentric structure, which might be driven by his- skilled workers reside in the northwestern part where torical reasons or sheer size. Furthermore, many char- the flight corridor of Berlin-Tegel is located. The lower acteristics show a clear East-West division as the former representation of high-skilled individuals in the north- separation of the city seems to still play a decisive role western part of the city indicates a correlation between in the agglomeration of the workforce. Munich, alterna- airport noise and skill-level. Using our new grid data on tively, appears more centered and shows a less diverse labor market characteristics, researchers can estimate the picture of labor market characteristics. Having already causal effect of airport noise on labor market outcomes detected several inner-city patterns in both cities, we in exploiting the unexpected delays similar to the strategy also stress the necessity to explain and understand these of Breidenbach et al. (2021) for rental prices. patterns in using more years and additional data. In this Strengthening this research potential, areas with a aspect, future research should exploit the possibility of high share of high-skilled residents are the exact areas combining these labor market data with other geodata. in which the share of medium-skilled workers is notice- ably low. The share of low-skilled workers does not match 5 Discussion and conclusions this segregated picture but has a segregation of its own: It Geodata are one of the furthest-reaching developments is clearly divided between the former East-West border, for regional and urban economics. Nevertheless, the lit- but with its highest share in the northwestern part of the erature that uses geodata is still comparatively small. This city where the flight corridor of the Berlin-Tegel airport article provides an overview of research areas that profit is located. While the share and trend of agglomeration from and already use geocoded data. Geodata enrich of medium- and high-skilled workers increased over the analyses on the regional scale and further provide insight years, the share of low-skilled workers decreased from into spatial relationships on the city or individual scale. 2000 to 2017, with lasting East-West segregation. To foster the usage of geodata, we share our experi- Munich (Fig.  8), in contrast, again shows less diver- ences in generating and preparing employment and labor sity. In 2017, the skill distribution of the entire city has market data at the IAB. The resulting data set IEB GEO an exhaustive share of at least 35% high-skilled workers. contains georeferenced and register-based information This number increased steadily in size and across the on all individuals who were subject to the German social city from 2000 onward, forming the largest skill share security system from 2000 to 2017. These linkable data in 2017. This trend to a higher share of high-skilled provide 350 million consolidated episodes with 19 mil- individuals might be driven by a German-wide trend of lion different geocodes, of which 95% are on the level of increasing shares of high-skilled workers over the years. exact mailing addresses. The small-scale, rich, and highly Alternatively, a city-specific reason might be the high reliable information make the IEB GEO a worldwide rent and cost of living in the city (Kholodilin and Mense unique and high-potential data set. 2012). The share of medium-skilled workers in Munich To illustrate the potential of the IEB GEO, the Addi- is contrarily small, especially in the city center, match- tional file  1 provides maps of all German cities with ing the findings of Eeckhout et  al. (2014,  p. 555) that more than 100,000 inhabitants. Every map displays the Geodata in labor market research: trends, potentials and perspectives Page 13 of 17 5 Fig. 7 Skills in Berlin. The figure shows the share of high-skilled (top layer), medium-skilled (middle layer) and low-skilled individuals (bottom layer) among all residents in our data in 1 × 1 kilometer grid cells in Berlin (759 grids) in 2000, 2010 and 2017. Light purple cells indicate low shares, and dark purple cells indicate high shares. We fixed the color scale for each feature so that it approximately ranges from the first to the ninth decile in all cities with more than 100,000 inhabitants. The data base of the maps is social security data from the IAB. For data protection reasons, we removed cells with fewer than 20 residents. Blue areas in the background represent water; green areas, forests; light yellow areas, settlements; solid gray lines, roads; and dashed gray lines, railroads 5 Page 14 of 17 K. Ostermann et al. Fig. 8 Skills in Munich. The figure presents the share of high-skilled (top layer), medium-skilled (middle layer) and low-skilled individuals (bottom layer) among all residents in our data in 1 × 1 kilometer grid cells in Munich (289 grids) in 2000, 2010 and 2017. Light purple cells indicate low shares, and dark purple cells indicate high shares. We fixed the color scale for each feature so that it approximately ranges from the first to the ninth decile in all cities with more than 100,000 inhabitants. The data base of the maps is social security data from the IAB. For data protection reasons, we removed cells with fewer than 20 residents. Blue areas in the background represent water; green areas, forests; light yellow areas, settlements; solid gray lines, roads; and dashed gray lines, railroads inner-city distribution of one labor market indicator on the employment and resident density, the distribution a 1 × 1 kilometer grid-cell level (e.g., wages, unemploy- of wages, employment status and skills. Whereas Berlin ment and skills). This article exemplarily describes the shows a multicentric pattern in the median daily wages, cities Berlin and Munich in greater detail. We observe the former division of East and West Germany is vis- large differences within and across these two cities in ible in wage inequality as well as in the share of regularly Geodata in labor market research: trends, potentials and perspectives Page 15 of 17 5 employed and low-skilled individuals. In contrast, descriptive evidence for all large cities in Germany. By Munich is more centered and shows a less diverse inner- sharing experiences on the implementation and prepa- city distribution. The descriptive results highlight the ration of geodata as well as examples of visualization, we need for further research using geodata to identify deter- encourage the social sciences community to exploit the minants of inner-city developments. potential of these new data. From a broader perspective, many German cities have not developed monocentrically, as traditional city equi- Supplementary Information librium models assume. Therefore, we emphasize the The online version contains supplementary material available at https:// doi. org/ 10. 1186/ s12651- 022- 00310-x. importance of alternative theoretical models such as that of Ahlfeldt et al. (2015). Our data at hand allows to iden- Additional file 1. Online appendix containing maps for all German cities tify the dynamics of agglomeration effects with higher with more than 100,000 inhabitants for theyears 2000, 2010 and 2017. The temporal frequency. Hence, future research can deter- maps visualize the inner-city distribution of the residential density, the employment density, the median wages, the gini-coefficient, the share of mine spatial equilibrium models with more precision. regular employed and unemployed as well as the share of low-, medium- In addition, our maps highlight the high prevalence of and high-skilled residents. segregation in Germany. We often find visible patterns of increasing segregation between larger neighborhood Acknowledgements clusters by median daily wage especially for cities in the The authors thank two anonymous referees, Philipp Breidenbach, Wolfgang Dauth, Malte Reichelt and Sandra Schaffner for many helpful comments and eastern part of Germany like Dresden and Leipzig, or suggestions. Moreover, we thank Sebastian Bähr and Konstantin Körner for in the Ruhr-region like Bochum and Bottrop. However, their help in substantially revising the grid cell data. We thank two anonymous we also find examples of decreasing (e.g., Hamburg and referees and the editors of the Journal for Labour Market Research for helpful comments. We also thank Elisabeth Roß, Haika Otholt, Petra Prietz and Barbara Cologne) or constant (e.g., Bonn or Mainz) segregation Wünsche for excellent legal advice on data privacy. that underlines the necessity of investigating these differ - ent trends over time more comprehensively. Author contributions All the authors have read and approved the final manuscript. The approach used in this study has some limitations. We only reported exemplary and descriptive evidence for Funding three separate years and two cities. Although we hint at We gratefully acknowledge financial support from the Wissenschaftsgemein- schaft Gottfried Wilhelm Leibniz e.V. Competition (K165/2018/Segregation reasons and developments, inference about (causal) rela- and regional mobility). Kerstin Ostermann acknowledges financial support tionships of the visualized distributions and their changes from the graduate program of the IAB and the Friedrich-Alexander University over time is beyond the scope of this study. However, Erlangen-Nürnberg (GradAB). The funding did not influence the design of the study, analysis, and interpretation of data. the detected patterns and differences within and across the two cities Berlin and Munich provide high-potential Availability of data and materials starting points for relevant research topics using the full The datasets analysed during the current study are not publicly available as the authors use administrative data of the Institute for Employment Research. panel data of the IEB GEO. The data are social data with administrative origin which are processed and A rather minor data limitation of the IEB GEO is that kept by Institute for Employment Research (IAB) according to Social Code III. it relies on social security data only. Therefore, the IEB There are certain legal restrictions due to the protection of data privacy. The data contain sensitive information and therefore are subject to the confiden- GEO provide no information about self-employed, civil tiality regulations of the German Social Code (Book I, Section 35, Paragraph servants, students, children or pure homemakers. Future 1). The data are held by the IAB (email: iab@iab.de, phone: +49 911 1790) and research can partly solve this issue by spatially merging are on-site available on reasonable request. The code is available and archived at the Research Data Centre of the IAB; see https:// iab. de/ en/ daten/ repli katio the IEB GEO to other geodata, which combination was nen. aspx for further information. The authors are willing to assist (Kerstin previously restricted to the county level for analyses with Ostermann, kerstin.ostermann@iab.de). the IEB. With data such as the IEB GEO, future research should Declarations analyze various topics of social sciences, as the examples Ethics approval and consent to participate in Sects. 2 and 4 have shown. By exploiting the advan- Not applicable. tages of geodata, research can provide more fine-scaled, causal evidence for the impact of regional shocks on Consent for publication Not applicable. neighborhood effects and individual distance thresh - olds. Overall, this study shows the potential and perspec- Competing interests tives of the usage of geodata enriched by comprehensive The authors declare that they have no competing interests Author details Institute for Employment Research (IAB), Regensburger Str. 100, 90478 Nürn- For an overview on geodata for Germany, visit the website of the RWI, 2 berg, Germany. Friedrich-Alexander University Erlangen-Nürnberg (FAU), h t t p s : / / w w w . r w i - e s s e n . d e / e n / f o r s c h u n g - b e r a t u n g / w e i t e r e / f o r s c h u n g s d a t e n zentr um- ruhr/ daten angeb ot. 5 Page 16 of 17 K. Ostermann et al. Findelgasse 7-9, 90402 Nürnberg, Germany. German Institute for Economic Dustmann, C., Fitzenberger, B., Zimmermann, M.: Housing expenditures and Research (DIW ), Mohrenstr. 58, 10117 Berlin, Germany. income inequality. ZEW-Centre for European Economic Research Discus- sion Paper 18-048 (2018) Received: 12 October 2021 Accepted: 13 April 2022 Dustmann, C., Ludsteck, J., Schönberg, U.: Revisiting the German wage struc- Published: 3 June 2022 ture. Q. J. Econ. 124(2), 843–881 (2009) Eeckhout, J., Pinheiro, R., Schmidheiny, K.: Spatial sorting. J. Polit Econ 122(3), 554–620 (2014) Feijten, P., Van Ham, M.: Neighbourhood change... reason to leave? Urban Stud. 46(10), 2103–2122 (2009) References Gathmann, C., Helm, I., Schönberg, U.: Spillover effects of mass layoffs. J. Eur. Ager, P., Eriksson, K., Hansen, C.W., Lønstrup, L.: How the 1906 San Francisco Econ. Assoc. 18(1), 427–468 (2020) earthquake shaped economic activity in the American West. Explor. Econ. Goldschmidt, D., Klosterhuber, W., Schmieder, J.F.: Identifying couples in Hist. 77, 101342 (2020) administrative data. J. Labour Market Res. 50(1), 29–43 (2017) Ahlfeldt, G.M., Redding, S.J., Sturm, D.M., Wolf, N.: The economics of density: Goodchild, M.F.: The quality of big (geo) data. Dialogues Hum. Geogr. 3(3), evidence from the Berlin Wall. Econometrica 83(6), 2127–2189 (2015) 280–284 (2013) Arntz, M.: The geographical mobility of unemployed workers. ZEW-Centre for Graham, B.S.: Identifying and estimating neighborhood effects. J. Econ. Lit. European Economic Research Discussion Paper 05-034 (2005) 56(2), 450–500 (2018) Bähr, S., Haas, G.-C., Keusch, F., Kreuter, F., Trappmann, M.: IAB-SMART-Studie: Haller, P., Heuermann, D.F.: Opportunities and competition in thick labor mar- Mit dem Smartphone den Arbeitsmarkt erforschen. In IAB-Forum: Das kets: evidence from plant closures. J. Reg. Sci. 60(2), 273–295 (2020) neue Onlinemagazin des Instituts für Arbeitsmarkt-und Berufsforschung, Helsley, R.W.: Urban political economics. In: Henderson, J.V., Thisse, J.-F. (eds.) pp. 09–01. IAB (2018) Handbook of Regional and Urban Economics, vol. 4, Chapter 54, pp. Bayer, P., Fang, H., McMillan, R.: Separate when equal? Racial inequality and 2381–2421. Elsevier, Amsterdam (2004) residential segregation. J. Urban Econ. 82, 32–48 (2014) Henderson, J.V., Storeygard, A., Weil, D.N.: Measuring economic growth from Bayer, P., Ross, S.L., Topa, G.: Place of work and place of residence: informal outer space. Am. Econ. Rev. 102(2), 994–1028 (2012) hiring networks and labor market outcomes. J. Polit. Econ. 116(6), Jacobebbinghaus, P., Seth, S.: The German integrated employment biogra- 1150–1196 (2008) phies sample IEBS. Schmollers Jahrbuch 127(2), 335–342 (2007) Brakman, S., Garretsen, H., Schramm, M.: The spatial distribution of wages: Jahn, E., Neugart, M.: Do neighbors help finding a job? Social networks and estimating the Helpman-Hanson model for Germany. J. Reg. Sci. 44(3), labor market outcomes after plant closures. Labour Econ. 65, 101825 437–466 (2004) (2020) Breidenbach, P., Cohen, J., Schaffner, S.: Continuation of air services at Berlin- Kang, Y., Zhang, F., Peng, W., Gao, S., Rao, J., Duarte, F., Ratti, C.: Understanding Tegel and its effects on apartment rental prices. Available at SSRN house price appreciation using multi-source big geo-data and machine 3840560 (2021) learning. Land Use Policy (Online first), 104919 (2020) Bügelmeyer, E., Schaffner, S., Schanne, N., Scholz, T.: Das DIW-IAB-RWI-Nach- Kennan, J., Walker, J.R.: The effect of expected income on individual migration barschaftspanel: Ein Scientific-Use-File mit lokalen Aggregatdaten und decisions. Econometrica 79(1), 211–251 (2011) dessen Verknüpfung mit dem deutschen Sozio-ökonomischen Panel. Kholodilin, K.A., Mense, A.: German cities to see further rises in housing prices RWI Materialien 97, RWI (2015) and rents in 2013. DIW Econ. Bull. 2(12), 16–26 (2012) Bundesagentur für Arbeit: Blickpunkt Arbeitsmarkt: Monatsbericht zum Arbe- Kremer, M.: How much does sorting increase inequality? Q. J. Econ. 112(1), its- und Ausbildungsmarkt. https:// www. arbei tsage ntur. de/ datei/ ba146 115–139 (1997) 273. pdf (2020) Lee, B.A., Oropesa, R.S., Kanan, J.W.: Neighborhood context and residential Card, D.: Using geographic variation in college proximity to estimate the mobility. Demography 31(2), 249–270 (1994) return to schooling. NBER Working Paper 4483 (1993) Lee, B.A., Reardon, S.F., Firebaugh, G., Farrell, C.R., Matthews, S.A., O’Sullivan, D.: Card, D., Heining, J., Kline, P.: Workplace heterogeneity and the rise of West Ger- Beyond the census tract: patterns and determinants of racial segregation man wage inequality. Q. J. Econ.128(3), 967–1015 (2013) at multiple geographic scales. Am. Sociol. Rev. 73(5), 766–791 (2008) Card, D., Mas, A., Rothstein, J.: Tipping and the dynamics of segregation. Q. J. Legewie, J., Schaeffer, M.: Contested boundaries: explaining where ethnoracial Econ. 123(1), 177–218 (2008) diversity provokes neighborhood conflict. Am. J. Sociol. 122(1), 125–161 Chetty, R., Hendren, N.: The impacts of neighborhoods on intergenerational (2016) mobility I: childhood exposure effects. Q. J. Econ. 133(3), 1107–1162 Lucas, R.E., Rossi-Hansberg, E.: On the internal structure of cities. Econometrica (2018) 70(4), 1445–1476 (2002) Combes, P.-P., Duranton, G., Gobillon, L.: Spatial wage disparities: sorting mat- Mossay, P., Picard, P.: Spatial segregation and urban structure. J. Reg. Sci. 59(3), ters! J. Urban Econ. 63(2), 723–742 (2008) 480–507 (2019) Currie, J., DellaVigna, S., Moretti, E., Pathania, V.: The effect of fast food restau- Oakes, J.M., Andrade, K.E., Biyoow, I.M., Cowan, L.T.: Twenty years of neighbor- rants on obesity and weight gain. Am. Econ. J. Econ. Policy 2(3), 32–63 hood effect research: an assessment. Curr. Epidemiol. Rep. 2(1), 80–87 (2010) (2015) Cutler, D.M., Glaeser, E.L.: Are ghettos good or bad? Q. J. Econ. 112(3), 827–872 Ottaviano, G., Thisse, J.-F.: Agglomeration and economic geography. In: Hen- (1997) derson, J.V., Thisse, J.-F. (eds.) Handbook of Regional and Urban Econom- Dauth, W.,Haller, P.: Berufliches Pendeln zwischen Wohn- und Arbeitsort: Klarer ics, vol. 4, Chapter 58, pp. 2563–2608. Elsevier, Amsterdam (2004) Trend zu längeren Pendeldistanzen. IAB-Kurzbericht 10/2018 (2018) Reardon, S.F., O’Sullivan, D.: Measures of spatial segregation. Sociol. Methodol. Dauth, W., Haller, P.: Is there loss aversion in the trade-off between wages and 34(1), 121–162 (2004) commuting distances? Reg. Sci. Urban Econ. 83, 103527 (2020) Reichelt, M., Abraham, M.: Occupational and regional mobility as substitutes: a Desmet, K., Henderson, J.V.: The geography of development within countries. new approach to understanding job changes and wage inequality. Soc. In: Duranton, G., Henderson, V., Strange, W. (eds.) Handbook of Regional Forces 95(4), 1399–1426 (2017) and Urban Economics, vol. 5, pp. 1457–1517. North-Holland, Amsterdam Rosenthal, S.S., Strange, W.C.: The attenuation of human capital spillovers. J. (2015) Urban Econ. 64(2), 373–389 (2008) Duranton, G., Henderson, V., Strange, W.: Handbook of Regional and Urban Rossi-Hansberg, E., Sarte, P.-D., Owens, R., III.: Housing externalities. J. Polit. Economics, vol. 5A. North-Holland, Amsterdam (2015) Econ. 118(3), 485–535 (2010) Duranton, G., Puga, D.: Urban land use. In: Duranton, G., Henderson, V., Rüttenauer, T.: Neighbours matter: a nation-wide small-area assessment of Strange, W. (eds.) Handbook of Regional and Urban Economics, vol. 5, pp. environmental inequality in Germany. Soc. Sci. Res. 70, 198–211 (2018) 467–560. North-Holland, Amsterdam (2015) Schelling, T.C.: Models of segregation. Am. Econ. Rev. 59(2), 488–493 (1969) Durlauf, S. N.: Neighborhood effects. In J. V. Henderson and J.-F. Thisse (eds.), Schelling, T.C.: Dynamic models of segregation. J. Math. Sociol. 1(2), 143–186 Handbook of Regional and Urban Economics, vol. 4, Chapter 50, pp. (1971) 2173–2242. Amsterdam: North-Holland (2004) Geodata in labor market research: trends, potentials and perspectives Page 17 of 17 5 Scholz, T., Rauscher, C., Reiher, J., Bachteler, T.: Geocoding of German admin- istrative data: the case of the Institute for Employment Research. FDZ- Methodenbericht 9 (2012) Schönwälder, K., Söhn, J.: Immigrant settlement structures in Germany: general patterns and urban levels of concentration of major groups. Urban Stud 46(7), 1439–1460 (2009) Sharkey, P., Faber, J.W.: Where, when, why, and for whom do residential contexts matter? Moving away from the dichotomous understanding of neighborhood effects. Annu. Rev. Sociol. 40, 559–579 (2014) Sorenson, O., Dahl, M.S.: Geography, joint choices, and the reproduction of gender inequality. Am. Sociol. Rev. 81(5), 900–920 (2016) Statistisches Bundesamt. Alle politisch selbständigen Gemeinden mit aus- gewählten Merkmalen am 30.09.2019 (3. Quartal 2019) (2019). https:// www. desta tis. de/ DE/ Themen/ Laend er- Regio nen/ Regio nales/ Gemei ndeve rzeic hnis/ Admin istra tiv/ Archiv/ GVAus zugQ/ Auszu gGV3Q Aktue ll. html Vom Berge, P., Schanne, N., Schild, C.-J., Trübswetter, P., Wurdack, A., Petrovic, A.: Eine räumliche Analyse für Deutschland: Wie sich Menschen mit niedri- gen Löhnen in Großstädten verteilen. IAB-Kurzbericht 12/2014 (2014) Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in pub- lished maps and institutional affiliations.

Journal

Journal for Labour Market ResearchSpringer Journals

Published: Dec 1, 2022

Keywords: Georeferenced data; Microdata; Register-based data; Urban economics; Regional science; Labor economics; Neighborhood effects; Spatial economics; Segregation; J12; J31; R12; O18

References