The Construction, Maintenance, and Enhancement of Address-Based Sampling Frames

The Construction, Maintenance, and Enhancement of Address-Based Sampling Frames Abstract Frames of residential mailing addresses based on U.S. Postal Service (USPS) sources are often used for selecting samples of housing units for surveys to be conducted in-person, by mail, by web, or using mixed modes. Address lists designed for mail delivery require some modifications for sampling purposes, which therefore require familiarity with aspects of the address files themselves. This paper takes a detailed look at the address components of these files along with the vendors from which such frames and samples are available. More specifically, this paper describes the types of address records and the components of an address record that are available. It discusses how vendors differ in both the services they provide with respect to sampling frames as well as in how those sampling frames are updated and maintained. The paper also details ways in which address-based frames can be enhanced with auxiliary data such as geocodes, area-level demographic variables, and commercial indicators at the individual address level. The match rate, append rate, accuracy, and utility of auxiliary variables are described, along with potential uses in survey design and estimation. Information contained within the paper will be useful for survey designers who have an interest in tailoring address lists for a given survey’s target population and budget. 1. GENERAL OVERVIEW AND MOTIVATION The use of address-based (ABS) sample frames has become more common for surveys with in-person, telephone, mail, and web components; examples of a few such studies include the National Health Interview Survey (NHIS), the General Social Survey (GSS), the National Household Education Survey (NHES), and the Residential Energy Consumption Survey. At the same time, the construction, maintenance, and enhancement of ABS frames can have major impacts on the cost, quality, and field operations of surveys. According to the 2017 American Association for Public Opinion Research (AAPOR) Task Force Report on the Future of Telephone Survey Research, many telephone samples in the future will likely rely only on the cell phone frame. Declining response rates for telephone surveys overall, however, coupled with legislation that affects how cell phone numbers can be dialed may continue to have adverse effects on costs for Random Digit Dial (RDD) surveys (AAPOR Task Force on The Future of U.S. General Population Telephone Survey Research 2017). In contrast, studies that use multiple modes to contact respondents are likely to persist as one way to maximize coverage and quality within realistic cost constraints. One common application involves using an address-based sample (Link, Battaglia, Frankel, Osborn, and Mokdad 2008) to contact selected households by mail coupled with using telephone interviewers to persuade those who remain nonrespondents (Alexander and Wetrogan 2000; de Leeuw 2005; AAPOR Task Force on The Future of U.S. General Population Telephone Survey Research 2017). Another example is the so called “push to web” methodology that received the 2017 Warren J. Mitofsky Innovators Award from AAPOR, which combines address-based samples with online data collection (Dillman, Smyth, and Christian 2014). Besides being a key component of mixed mode methods, address-based sampling can stand alone affording researchers flexibility with many options to target specific subpopulations or geographies with appended geocodes, phone numbers, demographics, and other data to the address frame. Although the appending process as well as any variables themselves are not without error, one can incorporate any assumed error into the design process for more cost-efficient studies. While not ubiquitous, the typical basis of ABS sampling frames in the United States is derived from the United States Postal Service’s (USPS) Address Management System (AMS). The AMS is a large database of over 170 million addresses, protected by law and maintained by the USPS for mail sorting and sequencing. Little is publicly known about the methods used to update and clean the AMS, but essentially the file is maintained and updated based on contributions from mail carriers about addresses on their routes, as well as updates provided to the USPS from local governments, post offices, and some vendors.1 While the USPS uses the AMS to ensure both the quality and completeness of mail services, it was neither created nor intended for use as a sampling frame. For this reason, vendors and survey researchers must transform the AMS for sampling. Some of the approaches vendors take to create a frame are known at a high level, but many steps employ proprietary business rules that are opaque to researchers or end users. Such approaches vary across vendors and can influence both survey coverage errors, field processing errors, and survey costs. For example, how a vendor de-duplicates and filters a list for erroneous inclusions can directly influence survey cost, quality, and coverage. A frame with many out-of-scope units may result in fewer completed interviews than expected, which in turn could introduce larger variances and costs per completed interview. The AAPOR Executive Council operating through the AAPOR Standards Committee, convened a task force to prepare a comprehensive report on address-based sampling (see Harter, Battaglia, Buskirk, Dillman, English, et al. (2016)). Task force members included Rachel Harter, Task Force Chair, Michael P. Battaglia, Trent D. Buskirk, Don A. Dillman, Ned English, Mansour Fahimi, Martin R. Frankel, Timothy Kennel, Joseph P. McMichael, Cameron Brook McPhee, Jill Montaquila DeMatteis, Tracie Yancey, and Andrew L. Zukerberg. This paper is drawn from that AAPOR Task Force Report and focuses specifically on illuminating the content and processes used by vendors to generate and maintain ABS frames. More specifically, in this paper we discuss the development of address-based frames from the “top down.” First, we focus on the types of vendors that provide ABS frames and samples, related in part to how such vendors manage their own databases of addresses. We then discuss the actual contents of the address records both from the USPS as well as any additional fields vendors create and add to these records. We conclude the article with a discussion of the types and sources of auxiliary variables that can be appended to address records within ABS frames. In the final section we also focus attention on evaluating the quality of the information appended to ABS frames and samples. We hope that this article can serve as detailed background for researchers and practitioners of ABS surveys as they become more common. 2. TYPES OF VENDORS OFFERING ABS SAMPLING FRAMES AND SAMPLES Sometimes called information resellers, data brokers, or direct mailers, vendors sell ABS frames and samples. Vendors may differ in the source of their addresses and in the geographies they cover as well as in the services they provide. For example, some vendors provide only samples while some provide only address lists, and others provide both. Some vendors can append auxiliary data such as geocodes, phone numbers, and person-level data, while others focus on providing only addresses. Some vendors have national address lists, while others focus on specific geographic areas. Another important distinction involves the nature of the relationship between the vendors and the United States Postal Service defined in part by the type of USPS product and license the vendors use to build, confirm and update their database of addresses. The USPS creates several extract files through the Address Information System (AIS) from the AMS database2 using proprietary business rules to remove some units. The most common extract file is the Delivery Sequence File (DSF) which contains every valid postal address in the United States and is used for standardizing mailing addresses for improved deliverability (O’Muircheartaigh, English, Eckman, Upchurch, Garcia Lopez, et al. 2006; Iannacchione 2011). One of the most notable services based on the DSF is the DSF2 which identifies addresses currently represented in the USPS delivery file as being “active.” Vendors receive lower bulk mailing postage rates for active addresses, so many have a DSF2 license. A related service is the Computerized Delivery Sequencing (CDS). Essentially the CDS includes all of the DSF2 services in addition to frequent transaction files with new addresses and other address changes. Vendors with either the CDS or DSF2 license from the USPS are generally referred to as “primary vendors” with others being “secondary vendors.” While there is some variability on where vendors acquire addresses in their databases, all primary vendors with either a CDS or DSF2 license must be certified by the USPS and undergo a rigorous application process (U.S. Postal Service 2013, 2014). Furthermore all vendors with a CDS or DSF2 license clean, verify, and validate their address lists with information from the same source based on the extracts from the AMS (U.S. Postal Service 2013, 2014). It is also important to note that the CDS and DSF2 licenses deal solely with mailing address information and do not contain telephone numbers, email addresses, or longitude/latitude coordinates.3 Before obtaining a CDS license, vendors must already have an address file that meets minimum thresholds for the percentage of an area’s addresses covered (U.S. Postal Service 2013). To begin the process vendors initially send their address lists to the USPS which in turn sequences the vendor’s files, removes undeliverable addresses, and adds new addresses (U.S. Postal Service 2013). To qualify for receiving updates through the CDS program, a vendor must be classified as a qualified mailer meaning they must already have a list that contains between 90 percent and 110 percent of the USPS addresses in a ZIP Code; if a vendor meets this qualification for a ZIP Code, the vendor is said to “own” the ZIP Code (Dohrmann, Han, and Mohadjer 2007; Iannacchione 2011; U.S. Postal Service 2013). Note that it is possible to have more than 100 percent of the addresses in an area due to duplicates, outdated addresses that no longer exist, addresses that never existed, those erroneously geocoded into the ZIP Code, or addresses with a misclassified address group. This updating process is done separately for each ZIP Code or address group. Vendors with a CDS license do not necessarily own all ZIP Codes in the United States, and thus do not necessarily receive updates from the USPS for all US ZIP Codes (U.S. Postal Service 2013). Therefore, one vendor may qualify to receive CDS updates only for City Carrier Residences in a ZIP Code, while another vendor may qualify for all addresses in the ZIP Code. Some vendors update their address lists with the CDS for ZIP codes they own and use other address sources for ZIP codes they don’t own. And while seemingly a high standard, McMichael, et al. (2012) evaluated one popular ABS vendor and found that roughly 90 percent of population lived in tracts for which the vendor address database had between 90 percent and 110 percent coverage. While the number of vendors who have a CDS license is rather limited, primary vendors with such licenses receive the most up-to-date and complete coverage of address lists in the ZIP codes they own. Primary vendors with a CDS license can also purchase the “No Statistics” (No-Stat) File, which contains inactive addresses such as planned addresses in new housing developments, vacant addresses on rural routes, addresses on rural routes where mail is forwarded to post office boxes, and addresses in some gated communities where mail is delivered to a central point. Most addresses in the No-Stat file do not receive mail; however, one analysis found that using a portion of the No-Stat File improved rural coverage by 2.2 percent, without adding many erroneous inclusions (Shook-Sa, Currivan, McMichael, and Iannacchione 2013). In comparison to the limited set of vendors with a CDS license, the DSF2 license is available to a broader group of vendors because it relaxes the requirements of ZIP Code “ownership” under the CDS (Iannacchione 2011). More specifically, vendors with the DSF2 license send their addresses to the USPS and receive a file in return indicating if each address appears on the AMS with additional variables about the addresses (U.S. Postal Service 2015b). Unlike the updates provided to CDS licensees, change files with new addresses and other changes are not included with the DSF2 license, but the files they receive are generally complete and derived from the same source as that provided to the CDS vendors (U.S. Postal Service 2014). Although vendors licensing DSF2 do not receive update files from the USPS with new addresses, some may get updates through supplemental files not originating from the USPS. Thus, the coverage of the address file from a vendor with a DSF2 license is not necessarily inferior. In the worst case scenario, the frames from these vendors may suffer from some undercoverage of new addresses when compared to vendors with a CDS license. Moreover, vendors who have DSF2 licenses, but not CDS licenses, are not eligible to purchase the No-Stat file. By definition, while secondary vendors do not have license agreements with the USPS, their sources may be the same as the primary vendors. While the quality of these addresses may be adequate and complete in some cases, the lack of a license to the central address extract products of the AMS means that addresses provided from secondary vendors have not been corroborated, verified, or updated from the USPS directly. Survey researchers obtaining samples or using frames from secondary vendors are then encouraged to inquire about sourcing and updating of addresses within their frames so as to better understand any implications for undercoverage. Regardless, it is important to consider coverage properties of a given ABS frame for a given study regardless of the vendor involved as even primary vendors with a CDS license may not own entire targeted geographies. For example, if a study was attempting to contact Americans living in rural areas and the sample was purchased from a CDS vendor that did not own any Rural Route or Contract Delivery Service Route addresses, we would expect undercoverage of the target population. Consequently, if a vendor did not “own” a particular sub-region of the desired target population or the vendor were a secondary vendor, it would then be important to inquire about how that vendor obtains and updates addresses for the given sub-region of interest. Also, while CDS licensees obtain the most regular updates from the AMS, the frequency of frame updates from one vendor to another can vary and may impact coverage, especially if the field period falls later in the calendar year than when the ABS sample was obtained. 3. THE BUILDING BLOCK OF ABS FRAMES – THE ADDRESS RECORD While there is variability in the sourcing of the addresses contained in sampling frames across vendors, the actual format of the basic address record itself should be relatively consistent so as to comply with basic rules of the USPS. Specifically, to improve the viability of address-based samples for bulk mailing and to improve matching, sorting, and deduplication, vendors must clean their files through parsing and standardization. Parsing is the process of separating one line of an address into standard components. According to USPS standards, the full street name should be parsed into at most four components including the street direction prefix, street name, street type, and street direction suffix. The other fields for an address include the unit or parcel number, the ZIP + 4, and the state. Standardization involves comparing parsed address components to valid values and formatting constraints. For example, the state in an address might be “Michigan,” “MI,” “MI.,” “Mich,” or “Mich.”. Standardization replaces all of these values with “MI,” the standard abbreviation for Michigan, and would also correct misspellings and internal variations by aligning with USPS rules. Finally, addresses may be edited as part of the standardization process due to new ZIP codes, street names, or street numbering schemes.4 If an address-based frame was generated by a primary vendor then it is likely that a significant portion, if not all, of the addresses within the frames were corroborated and possibly updated using AMS extract products (e.g., DSF2 or CDS). The AMS provides the parsed and standardized version of the address by including four standard components or address fields: the unit or parcel number; the street name; the ZIP + 4 and the state. The AMS can also return additional associated attributes for a given address record that describe aspects of the delivery method and other associated characteristics such as dwelling activity, type of address, and the type of dwelling unit (Iannacchione 2011). The standard and additional fields that are available from the AMS for a given address are displayed in table 1. Table 1. Components Available for Each Address in the AMS Standard components Additional associated attributes Unit #/parcel Seasonal indicator Street Educational indicator ZIP + 4 Vacant indicator State Delivery mode type indicator  Residential indicator  Business indicator  Drop indicator Drop count Locatable address conversion system (LACS) indicator  No-stat(istics) indicator  Address throwback indicator (U.S. Postal Service 2015b) Standard components Additional associated attributes Unit #/parcel Seasonal indicator Street Educational indicator ZIP + 4 Vacant indicator State Delivery mode type indicator  Residential indicator  Business indicator  Drop indicator Drop count Locatable address conversion system (LACS) indicator  No-stat(istics) indicator  Address throwback indicator (U.S. Postal Service 2015b) Table 1. Components Available for Each Address in the AMS Standard components Additional associated attributes Unit #/parcel Seasonal indicator Street Educational indicator ZIP + 4 Vacant indicator State Delivery mode type indicator  Residential indicator  Business indicator  Drop indicator Drop count Locatable address conversion system (LACS) indicator  No-stat(istics) indicator  Address throwback indicator (U.S. Postal Service 2015b) Standard components Additional associated attributes Unit #/parcel Seasonal indicator Street Educational indicator ZIP + 4 Vacant indicator State Delivery mode type indicator  Residential indicator  Business indicator  Drop indicator Drop count Locatable address conversion system (LACS) indicator  No-stat(istics) indicator  Address throwback indicator (U.S. Postal Service 2015b) Specific associated attributes that may be of interest to survey researchers with respect to coverage issues include the seasonal, educational, vacant, drop and throwback indicators, with the drop indicator and corresponding drop count indicators of potential interest for sample weighting. We describe each of these key indicators in detail in table 2 providing a description along with specific considerations regarding the use of each indicator for generating ABS samples. Table 2. Description of Select Associated Attributes Available from the AMS for an Address Record that May Be of Interest to Survey Researchers AMS indicator AMS field description Considerations for ABS sampling frames and samples Vacant A flag that indicates whether or not an address is vacant or not. According to USPS guidelines, an address must be unoccupied for 90 days to be classified as vacant (Iannacchione 2011). Recent studies have found between 38 percent and 41 percent of units classified as vacant to be occupied (Amaya et al. 2014; Kalton, Kali, and Sigman 2014). A housing unit that is vacant may quickly become occupied. Given the lag time required by the USPS to declare an address vacant and the (under)coverage potential with eliminating these addresses straight-away, caution should be exercised in removing addresses classified as such. Seasonal/educational According to the USPS, the Seasonal Delivery Indicator “specifies whether a given address receives mail only during a specific season (e.g., a summer-only residence)” (U.S. Postal Service 2013, p. 25). This indicator also includes educational delivery points. In 2010, there were about 1 million seasonal delivery addresses and about 200, 000 educational delivery points in the AMS. About 38 percent of the seasonal delivery points were found to be occupied in the 2010 Census, and 40 percent of the educational delivery points were occupied (Kennel 2012). For similar concerns about the timeliness and accuracy of the Season Delivery indicator, one should exercise caution before removing addresses classified as seasonal delivery. Throwbacks Throwback addresses are city-style addresses for which mail is redirected by the USPS to a P.O. Box (Iannacchione 2011). Such households receive mail intended for either their city-style address or the accompanying P.O. Box at the P.O. Box. Households with both a city-style address and a P.O. Box in the frame, whether or not a throwback, have no linkage noted between the two addresses. Furthermore, because P.O. Boxes are leased by individuals, a housing unit may have a physical address and multiple P.O. Boxes for multiple persons living in the same housing unit. Because the risk of duplication between mailing addresses and P.O. Boxes is high, and the chance of locating a housing unit on the basis of the P.O. Box is low, P.O. Box addresses are often excluded from sampling frames. For personal visit surveys, the city-style address would be the appropriate address to use. For mail surveys, either address could be used. Drop points/drop count Drop points are mail delivery points that serve multiple households or businesses (Dekker, Amaya, LeClere, and English 2012; Shook-Sa et al. 2013; Amaya 2017). In some cases the AMS provides the drop count defined as the number of drop units are contained within a specific drop point. Less than 1 percent of all residential addresses on the CDS file are drop point addresses, but this type of addressing tends to be clustered (Dohrmann et al. 2014). More research is needed into where drop points are located, whether they actually correspond to single or multi units, and how to handle them. To reduce undercoverage, units with drop point addresses should be given a chance of selection. For in-person surveys, this can be done by interviewing all units at a sample drop point or by listing the units at the drop point and subsampling the list. AMS indicator AMS field description Considerations for ABS sampling frames and samples Vacant A flag that indicates whether or not an address is vacant or not. According to USPS guidelines, an address must be unoccupied for 90 days to be classified as vacant (Iannacchione 2011). Recent studies have found between 38 percent and 41 percent of units classified as vacant to be occupied (Amaya et al. 2014; Kalton, Kali, and Sigman 2014). A housing unit that is vacant may quickly become occupied. Given the lag time required by the USPS to declare an address vacant and the (under)coverage potential with eliminating these addresses straight-away, caution should be exercised in removing addresses classified as such. Seasonal/educational According to the USPS, the Seasonal Delivery Indicator “specifies whether a given address receives mail only during a specific season (e.g., a summer-only residence)” (U.S. Postal Service 2013, p. 25). This indicator also includes educational delivery points. In 2010, there were about 1 million seasonal delivery addresses and about 200, 000 educational delivery points in the AMS. About 38 percent of the seasonal delivery points were found to be occupied in the 2010 Census, and 40 percent of the educational delivery points were occupied (Kennel 2012). For similar concerns about the timeliness and accuracy of the Season Delivery indicator, one should exercise caution before removing addresses classified as seasonal delivery. Throwbacks Throwback addresses are city-style addresses for which mail is redirected by the USPS to a P.O. Box (Iannacchione 2011). Such households receive mail intended for either their city-style address or the accompanying P.O. Box at the P.O. Box. Households with both a city-style address and a P.O. Box in the frame, whether or not a throwback, have no linkage noted between the two addresses. Furthermore, because P.O. Boxes are leased by individuals, a housing unit may have a physical address and multiple P.O. Boxes for multiple persons living in the same housing unit. Because the risk of duplication between mailing addresses and P.O. Boxes is high, and the chance of locating a housing unit on the basis of the P.O. Box is low, P.O. Box addresses are often excluded from sampling frames. For personal visit surveys, the city-style address would be the appropriate address to use. For mail surveys, either address could be used. Drop points/drop count Drop points are mail delivery points that serve multiple households or businesses (Dekker, Amaya, LeClere, and English 2012; Shook-Sa et al. 2013; Amaya 2017). In some cases the AMS provides the drop count defined as the number of drop units are contained within a specific drop point. Less than 1 percent of all residential addresses on the CDS file are drop point addresses, but this type of addressing tends to be clustered (Dohrmann et al. 2014). More research is needed into where drop points are located, whether they actually correspond to single or multi units, and how to handle them. To reduce undercoverage, units with drop point addresses should be given a chance of selection. For in-person surveys, this can be done by interviewing all units at a sample drop point or by listing the units at the drop point and subsampling the list. Table 2. Description of Select Associated Attributes Available from the AMS for an Address Record that May Be of Interest to Survey Researchers AMS indicator AMS field description Considerations for ABS sampling frames and samples Vacant A flag that indicates whether or not an address is vacant or not. According to USPS guidelines, an address must be unoccupied for 90 days to be classified as vacant (Iannacchione 2011). Recent studies have found between 38 percent and 41 percent of units classified as vacant to be occupied (Amaya et al. 2014; Kalton, Kali, and Sigman 2014). A housing unit that is vacant may quickly become occupied. Given the lag time required by the USPS to declare an address vacant and the (under)coverage potential with eliminating these addresses straight-away, caution should be exercised in removing addresses classified as such. Seasonal/educational According to the USPS, the Seasonal Delivery Indicator “specifies whether a given address receives mail only during a specific season (e.g., a summer-only residence)” (U.S. Postal Service 2013, p. 25). This indicator also includes educational delivery points. In 2010, there were about 1 million seasonal delivery addresses and about 200, 000 educational delivery points in the AMS. About 38 percent of the seasonal delivery points were found to be occupied in the 2010 Census, and 40 percent of the educational delivery points were occupied (Kennel 2012). For similar concerns about the timeliness and accuracy of the Season Delivery indicator, one should exercise caution before removing addresses classified as seasonal delivery. Throwbacks Throwback addresses are city-style addresses for which mail is redirected by the USPS to a P.O. Box (Iannacchione 2011). Such households receive mail intended for either their city-style address or the accompanying P.O. Box at the P.O. Box. Households with both a city-style address and a P.O. Box in the frame, whether or not a throwback, have no linkage noted between the two addresses. Furthermore, because P.O. Boxes are leased by individuals, a housing unit may have a physical address and multiple P.O. Boxes for multiple persons living in the same housing unit. Because the risk of duplication between mailing addresses and P.O. Boxes is high, and the chance of locating a housing unit on the basis of the P.O. Box is low, P.O. Box addresses are often excluded from sampling frames. For personal visit surveys, the city-style address would be the appropriate address to use. For mail surveys, either address could be used. Drop points/drop count Drop points are mail delivery points that serve multiple households or businesses (Dekker, Amaya, LeClere, and English 2012; Shook-Sa et al. 2013; Amaya 2017). In some cases the AMS provides the drop count defined as the number of drop units are contained within a specific drop point. Less than 1 percent of all residential addresses on the CDS file are drop point addresses, but this type of addressing tends to be clustered (Dohrmann et al. 2014). More research is needed into where drop points are located, whether they actually correspond to single or multi units, and how to handle them. To reduce undercoverage, units with drop point addresses should be given a chance of selection. For in-person surveys, this can be done by interviewing all units at a sample drop point or by listing the units at the drop point and subsampling the list. AMS indicator AMS field description Considerations for ABS sampling frames and samples Vacant A flag that indicates whether or not an address is vacant or not. According to USPS guidelines, an address must be unoccupied for 90 days to be classified as vacant (Iannacchione 2011). Recent studies have found between 38 percent and 41 percent of units classified as vacant to be occupied (Amaya et al. 2014; Kalton, Kali, and Sigman 2014). A housing unit that is vacant may quickly become occupied. Given the lag time required by the USPS to declare an address vacant and the (under)coverage potential with eliminating these addresses straight-away, caution should be exercised in removing addresses classified as such. Seasonal/educational According to the USPS, the Seasonal Delivery Indicator “specifies whether a given address receives mail only during a specific season (e.g., a summer-only residence)” (U.S. Postal Service 2013, p. 25). This indicator also includes educational delivery points. In 2010, there were about 1 million seasonal delivery addresses and about 200, 000 educational delivery points in the AMS. About 38 percent of the seasonal delivery points were found to be occupied in the 2010 Census, and 40 percent of the educational delivery points were occupied (Kennel 2012). For similar concerns about the timeliness and accuracy of the Season Delivery indicator, one should exercise caution before removing addresses classified as seasonal delivery. Throwbacks Throwback addresses are city-style addresses for which mail is redirected by the USPS to a P.O. Box (Iannacchione 2011). Such households receive mail intended for either their city-style address or the accompanying P.O. Box at the P.O. Box. Households with both a city-style address and a P.O. Box in the frame, whether or not a throwback, have no linkage noted between the two addresses. Furthermore, because P.O. Boxes are leased by individuals, a housing unit may have a physical address and multiple P.O. Boxes for multiple persons living in the same housing unit. Because the risk of duplication between mailing addresses and P.O. Boxes is high, and the chance of locating a housing unit on the basis of the P.O. Box is low, P.O. Box addresses are often excluded from sampling frames. For personal visit surveys, the city-style address would be the appropriate address to use. For mail surveys, either address could be used. Drop points/drop count Drop points are mail delivery points that serve multiple households or businesses (Dekker, Amaya, LeClere, and English 2012; Shook-Sa et al. 2013; Amaya 2017). In some cases the AMS provides the drop count defined as the number of drop units are contained within a specific drop point. Less than 1 percent of all residential addresses on the CDS file are drop point addresses, but this type of addressing tends to be clustered (Dohrmann et al. 2014). More research is needed into where drop points are located, whether they actually correspond to single or multi units, and how to handle them. To reduce undercoverage, units with drop point addresses should be given a chance of selection. For in-person surveys, this can be done by interviewing all units at a sample drop point or by listing the units at the drop point and subsampling the list. 3.1 Additional Attributes and Record Types to Consider for Address-Based Frames In many household surveys the ultimate sampling unit is either a person or a household. Although there is usually a one-to-one correspondence between addresses and households, there are numerous exceptions where one household has multiple addresses or multiple households share the same address. One housing unit, household, or person can be associated with more than one address or one address may contain an unspecified number of residences and accommodate multiple households. To account for these scenarios within the sampling frame, vendors can rely on a combination of associated attributes from the AMS (in the case of primary vendors) as well as a collection of other record indicators that don’t originate directly from the AMS. In the following subsection we discuss some of these common indicators including P.O. Boxes, mergers, reconfigurations, and multi-unit placeholders. We also discuss implications for survey research in using these indicators to customize the sampling frame for a given study. In 2010 there were approximately 16 million residential P.O. Box addresses on the AMS. P.O. Box addresses tend to be excluded from ABS frames because they pose a number of operational complications. Namely, if both a residential address and any corresponding P.O. Boxes are included in the sampling frame duplicate potential points of contact exist for the same household in the frame and thus households with P.O. Boxes have an increased chance of inclusion in ABS samples. Certainly one could handle the multiplicity of selection for such households if they were known a priori. However, there is no available link between a residential address and any specific P.O. Boxes that are related to it, which is true whether or not there is a throwback indicator present. Since any multiplicities cannot be computed, P.O. Boxes are generally excluded from the sampling frames to avoid the unknown increase in the probability of inclusion for addresses with associated P.O. Boxes. There is one exception to the exclusion based on a P.O. Boxes indicator called “only way to get mail” (Faulstich 2011). A P.O. Box is considered “only way to get mail” (OWGM) if the corresponding residential address cannot receive mail due to the lack of curbside delivery available from the USPS. Since the OWGM status is not part of the AMS, not all vendors can identify OWGM P.O. Boxes. Keeping OWGM P.O. Boxes on the ABS sampling frame increases coverage and does not introduce multiplicity of inclusion provided that those addresses whose delivery mode indicator from the AMS signals no curbside service are excluded from the frame.5 However, if a study calls for in-person interviewing, then including these addresses and excluding the OWGM P.O. Boxes would provide better coverage and allow field staff to conduct in-person interviews as P.O. Boxes do not identify dwelling location. While not as common as P.O. Boxes, the USPS delivers mail to approximately one million active residential rural route delivery points. These rural delivery points are similar to P.O. Boxes in that there is no explicit link to an actual housing unit. And like P.O. Boxes, matching rural route addresses to phone numbers and e-mail addresses is challenging. Including rural route delivery points will improve coverage for mail mode surveys if the target population includes rural areas. However, including them in phone and in-person surveys may not improve the coverage as desired and might exacerbate other errors. Sometimes address files contain multiunit placeholders or header records for an apartment building. A multiunit placeholder is an address record for a multiunit building without a unit designation. For example, “101 Main Street” may have two apartments: Apartment 1 and Apartment 2. If the frame has an address record for “101 Main Street” without any unit designation in addition to the two addresses with unit designations, then the “101 Main Street” address is considered a multiunit placeholder. Multiunit placeholders allow mail without unit designations to be sorted and delivered to the correct building, even though clerical sorting is needed to deliver the mail within building. Also, some multiunit placeholders are used by the main business office of apartment buildings, although they may be shared by resident households as well. If the frame also contains addresses for one or both units, then the multiunit placeholder should be considered an erroneous inclusion. Otherwise, the multiunit placeholder serves as a drop point indicator and should be treated as a drop, even if it is not flagged as such. In addition to multiunit placeholders, ABS frames may contain multiple copies of the same address for other reasons. One possibility comes from some apartment buildings that have multiple undesignated units. Imagine, for example, that an apartment building has four such units. The AMS may contain four copies of the same address without any unit designation. In this case four records appear to be duplicates of the same unit but they actually represent four different apartments. In 2010 the Census Bureau detected more than 3.5 million basic street addresses with multiple undesignated units. In fact, over 95 percent of the basic street addresses containing five or more units had at least two undesignated addresses on the AMS extracts delivered to the Census Bureau (Kennel 2012). Another common reason why duplicate addresses can exist on the sampling frame is a result of mergers that happen when one household acquires an adjacent unit or when developers consolidate units so that two or more units become a single residence. For example, “101 Main Street Apartment A” and “101 Main Street Apartment B” could merge into a single unit “101 Main Street Apartment A/B.” If “101 Main Street Apartment A” and “101 Main Street Apartment B” were not deleted and “101 Main Street Apartment A/B” was not added to the address list, the merger could result in the single unit having two addresses on the ABS frame. Related to mergers are address reconfigurations which can also create duplicate records on an ABS frame. When street names, street numbers, or within-structure identifiers are changed, an address is said to be reconfigured. For example, “101 Main Street Apartment A” could be renovated and renamed “101 Main Street Apartment Oak.” If a new address were added to the ABS frame but the old address not deleted, one housing unit could receive multiple chances of selection. The presence of seasonal housing, P.O. Box addresses, mergers, reconfigured units, and multiunit placeholders can inflate costs, variance, and sometimes bias for a given study (Amaya, LeClere, Fiorio, and English 2014). Survey researchers should anticipate situations where one housing unit or person will be associated with multiple addresses and situations where one address will be associated with multiple housing units. When possible, duplicates should be removed from the sampling frame prior to selection without removing valid units. When duplicates cannot be removed prior to interviewing, survey researchers typically inflate the initial sample size to account for reduction in unique addresses caused by duplication, with residency rules imposed for households with multiple addresses discovered during field work. Moreover, the situation in which multiple households share the same address can also impact the efficiency of the ABS sample field work. A better understanding of mergers, multiunit placeholders and reconfigurations can inform sampling and weighting procedures and field protocols. Field procedures should be developed to ensure that proper selection probabilities for such households can be computed in the absence of a priori information about them on the sampling frame. 4. ENHANCING ADDRESS-BASED FRAMES AND SAMPLES USING AUXILIARY VARIABLES There is no shortage of research on the advantages of using auxiliary variables for survey sample designs and weighting adjustments. In the era of responsive survey designs auxiliary variables can leverage information to improve efficiencies of data collection. One of the primary advantages of ABS designs is the availability of a rich collection of auxiliary variables that can be included within the ABS frame or added to samples selected from it. Auxiliary variables usually provide insights at two different levels – the unit or address level and the summary or geographic level. Unit level auxiliary variables provide information about the household associated with a given address, while summary variables provide information about the neighborhood, or some other pertinent level of geography, within which the address belongs. Auxiliary variables typically exclude those variables already available from the AMS or related extract services such as those listed in table 1. The telephone number associated with the address is a very popular auxiliary variable that is often appended to address-based samples (see for example, Olson and Buskirk 2015). Vendors such as Marketing Systems Group (MSG), InfoGroup, and others compile telephone numbers from sources including residential listings and market-research databases and offer matching as a service (Kennel and Li 2009; Amaya, Skalland, and Wooten 2010). Another common type of auxiliary information exists in the form of “presence” flags indicating that an address has a given characteristic of interest (e.g., “single adult household”) versus either not having it explicitly or missing that information. Such characteristics can and have been used to improve coverage of harder to reach populations of interest. For example, English, Li, Mayfield, and Frasier (2014) used the presence of children flags appended to an ABS sample to more efficiently locate households with newborn children. Regardless of the source or type, auxiliary variables appended to ABS frames or samples also afford researchers with information about both respondents and nonrespondents alike that may be used as the basis for weighting adjustments aimed at mitigating the potential for nonresponse bias. Clearly, the effectiveness of auxiliary variables to differentiate the population in support of stratified designs or in improving the coverage for hard-to-reach populations or in discerning differences between respondents and nonrespondents is directly related to their accuracy, completeness, and the quality of the linking process that connects these variables to the actual addresses. We explore these issues in more detail in this section. 4.1 Appending Auxiliary Data to Address Records The addition of auxiliary variables to either the ABS frame itself or to resulting ABS samples is enabled by a linking variable, or common key, associated with both the address itself and the source of the auxiliary information. Addresses can be linked to external files that store information at the person, household, address, housing unit or geographic levels. Since the AMS does not contain person, household, or local geographic data about each address, the address is the logical and obvious key for linking files. However, ABS vendors may have proprietary person, household, and geographic data associated with their addresses. Furthermore other measures of geography associated with the address can also be used as the key for appending auxiliary variables. For example, any one of a myriad of Census based variables can be appended to addresses by using the corresponding block, block group, or other area FIPS geographic code. No matter what type of key is used to create the link between ABS frame records and those from auxiliary sources, the appending process for adding auxiliary information relies on a combination of geocoding and matching via the common key. We discuss these two steps, in turn, in the subsections that follow. 4.1.1 Geocoding addresses While the AMS database (which serves as the starting reference point for most ABS frames within the U.S.) and its associated extract services includes state, county and congressional district codes, it does not contain codes for lower levels of geography such as Census tract, block group, and block. To link an address to information about its block group, ABS vendors need to generate the geographically specific code via geocoding. More specifically, geocoding is the process of determining the geographic location for an address, often in longitude and latitude coordinates. Most vendors provide block group level geocodes using one of a number of software packages or secondary providers that specialize in geo-location services. Some vendors also assign the full four-digit tabulation block code as well as longitudinal and latitude coordinates. There is some variability in the specific methods that can be used to geocode addresses. Any details about how a given vendor performs geocoding should be available from that vendor or from technical documentation available for the specific software package the vendor uses. Overall, the accuracy of geocodes depends on the way addresses are matched, the interpolation algorithms implemented, and the reference datasets used (Goldberg, Wilson, and Knoblock 2007). However, matching and interpolation generally have less impact when the geocoding is used to make assignments at levels of geography that are higher than blocks. In table 3 we highlight five of the most common methods for geocoding along with some of their advantages and disadvantages. More specific details on these methods and relevant software as well as implications for survey research have been documented elsewhere including: Kennel and Li (2009); Eckman and English (2012a, 2012b); Fiorio and Fu (2012); Dohrmann, Kalton, Montaquila, Good, and Berlin (2012). High quality geocoding is often important because geocodes are often used to link to geographic auxiliary data. If geocodes are incorrectly assigned, the geographic summary variables might mischaracterize a given unit and the neighborhood around it. Table 3. Description of Different Methods for Geocoding along with Key Advantages and Disadvantages Method Description Advantages Disadvantages Direct coding Field representatives use a GPS device to capture the coordinates of the front door of the housing unit of the address. With the coordinates in hand, some Geographic Information System applications can reference Topographically Integrated Geographic Encoding and Referencing (TIGER) lines (or commercially enhanced versions of TIGER lines) to assign the coordinates to specific blocks. Costly to manage; GPS signals are not available everywhere, and handheld devices often have both random and systemic errors when determining coordinates (Bonner, Han, Nie, Rogerson, Vena, et al. 2003; Ward, Nuckols, Giglierano, Bonner, Wolter et al. 2005; Alkire 2010; Fiorio and Fu 2012). Clerical coding Geocodes are produced by overlaying either TIGER or TIGER-based lines or block boundary files over aerial photographs or satellite imagery to assign a block code to each housing unit. Less costly than using Field staff for GPS enumeration. Substantial labor cost of manually assigning blocks. Satellite imagery and manual processes are subject to random and systemic errors as well. Longitude/latitude assignment via interpolation Latitude and longitude coordinates can be assigned to an address by matching the address to a TIGER line segment and its corresponding range of house numbers (Cayo and Talbot 2003). Once matched, a specific coordinate can be assigned through interpolation. Generally low labor costs Errors do occur, especially in rural areas or places experiencing rapid change (Eckman and English 2012a, 2012b); software licensing may be costly. Not all addresses, especially some rural route addresses, can be accurately linked to a TIGER line. Longitude and latitude coordinates are approximate since they are approximated based on the range of house numbers on a TIGER line segment. Zip code based Address is geocoded based on the nine digit ZIP Code (ZIP and ZIP+4) or some fraction of the ZIP Code. Simplest and least costly method of geocoding. Since all addresses have a ZIP code, usually addresses that cannot be geocoded using other methods can be geocoded using this method. Various methods for interpolating coordinates from ZIP Codes giving end users flexibility to make best use of available data. For example, all addresses with the same ZIP Code can be assigned the coordinates at the geographic left of the ZIP Code Tabulation Area ZCTA (Goldberg et al. 2007). ZIP Codes represent a collection of delivery routes, but do not define an area with boundaries. Various methods that can be used for interpolating coordinates that can produce varying levels of quality. For example, for assigning blocks, the accuracy of geocoding for assigning blocks is much better using the ZIP+4 Code rather than just the ZIP code. Not very accurate, if interested in exact location of address. Since ZIP codes can be quite large, geocoding based on the 5 digit ZIP code is much more accurate for larger geographies, such as counties and tracts than blocks. Parcel data match Some local governments have publicly available parcel datasets containing the boundaries of each parcel and the coordinates of each parcel’s centroid or the left of the structure on the parcel. Proprietary datasets also exist for many counties. Addresses can be matched to such datasets and coordinates assigned based on coordinates in the parcel data (see Cayo and Talbot 2003). Successful assignment implies a highly-precise location. Depending on the quality of the parcel data, quality of geocodes can be higher than interpolation methods. Software is more costly to license than street-range interpolation and may not have complete coverage. Quality of geocodes is dependent on coverage and accuracy of parcel data file. Method Description Advantages Disadvantages Direct coding Field representatives use a GPS device to capture the coordinates of the front door of the housing unit of the address. With the coordinates in hand, some Geographic Information System applications can reference Topographically Integrated Geographic Encoding and Referencing (TIGER) lines (or commercially enhanced versions of TIGER lines) to assign the coordinates to specific blocks. Costly to manage; GPS signals are not available everywhere, and handheld devices often have both random and systemic errors when determining coordinates (Bonner, Han, Nie, Rogerson, Vena, et al. 2003; Ward, Nuckols, Giglierano, Bonner, Wolter et al. 2005; Alkire 2010; Fiorio and Fu 2012). Clerical coding Geocodes are produced by overlaying either TIGER or TIGER-based lines or block boundary files over aerial photographs or satellite imagery to assign a block code to each housing unit. Less costly than using Field staff for GPS enumeration. Substantial labor cost of manually assigning blocks. Satellite imagery and manual processes are subject to random and systemic errors as well. Longitude/latitude assignment via interpolation Latitude and longitude coordinates can be assigned to an address by matching the address to a TIGER line segment and its corresponding range of house numbers (Cayo and Talbot 2003). Once matched, a specific coordinate can be assigned through interpolation. Generally low labor costs Errors do occur, especially in rural areas or places experiencing rapid change (Eckman and English 2012a, 2012b); software licensing may be costly. Not all addresses, especially some rural route addresses, can be accurately linked to a TIGER line. Longitude and latitude coordinates are approximate since they are approximated based on the range of house numbers on a TIGER line segment. Zip code based Address is geocoded based on the nine digit ZIP Code (ZIP and ZIP+4) or some fraction of the ZIP Code. Simplest and least costly method of geocoding. Since all addresses have a ZIP code, usually addresses that cannot be geocoded using other methods can be geocoded using this method. Various methods for interpolating coordinates from ZIP Codes giving end users flexibility to make best use of available data. For example, all addresses with the same ZIP Code can be assigned the coordinates at the geographic left of the ZIP Code Tabulation Area ZCTA (Goldberg et al. 2007). ZIP Codes represent a collection of delivery routes, but do not define an area with boundaries. Various methods that can be used for interpolating coordinates that can produce varying levels of quality. For example, for assigning blocks, the accuracy of geocoding for assigning blocks is much better using the ZIP+4 Code rather than just the ZIP code. Not very accurate, if interested in exact location of address. Since ZIP codes can be quite large, geocoding based on the 5 digit ZIP code is much more accurate for larger geographies, such as counties and tracts than blocks. Parcel data match Some local governments have publicly available parcel datasets containing the boundaries of each parcel and the coordinates of each parcel’s centroid or the left of the structure on the parcel. Proprietary datasets also exist for many counties. Addresses can be matched to such datasets and coordinates assigned based on coordinates in the parcel data (see Cayo and Talbot 2003). Successful assignment implies a highly-precise location. Depending on the quality of the parcel data, quality of geocodes can be higher than interpolation methods. Software is more costly to license than street-range interpolation and may not have complete coverage. Quality of geocodes is dependent on coverage and accuracy of parcel data file. Table 3. Description of Different Methods for Geocoding along with Key Advantages and Disadvantages Method Description Advantages Disadvantages Direct coding Field representatives use a GPS device to capture the coordinates of the front door of the housing unit of the address. With the coordinates in hand, some Geographic Information System applications can reference Topographically Integrated Geographic Encoding and Referencing (TIGER) lines (or commercially enhanced versions of TIGER lines) to assign the coordinates to specific blocks. Costly to manage; GPS signals are not available everywhere, and handheld devices often have both random and systemic errors when determining coordinates (Bonner, Han, Nie, Rogerson, Vena, et al. 2003; Ward, Nuckols, Giglierano, Bonner, Wolter et al. 2005; Alkire 2010; Fiorio and Fu 2012). Clerical coding Geocodes are produced by overlaying either TIGER or TIGER-based lines or block boundary files over aerial photographs or satellite imagery to assign a block code to each housing unit. Less costly than using Field staff for GPS enumeration. Substantial labor cost of manually assigning blocks. Satellite imagery and manual processes are subject to random and systemic errors as well. Longitude/latitude assignment via interpolation Latitude and longitude coordinates can be assigned to an address by matching the address to a TIGER line segment and its corresponding range of house numbers (Cayo and Talbot 2003). Once matched, a specific coordinate can be assigned through interpolation. Generally low labor costs Errors do occur, especially in rural areas or places experiencing rapid change (Eckman and English 2012a, 2012b); software licensing may be costly. Not all addresses, especially some rural route addresses, can be accurately linked to a TIGER line. Longitude and latitude coordinates are approximate since they are approximated based on the range of house numbers on a TIGER line segment. Zip code based Address is geocoded based on the nine digit ZIP Code (ZIP and ZIP+4) or some fraction of the ZIP Code. Simplest and least costly method of geocoding. Since all addresses have a ZIP code, usually addresses that cannot be geocoded using other methods can be geocoded using this method. Various methods for interpolating coordinates from ZIP Codes giving end users flexibility to make best use of available data. For example, all addresses with the same ZIP Code can be assigned the coordinates at the geographic left of the ZIP Code Tabulation Area ZCTA (Goldberg et al. 2007). ZIP Codes represent a collection of delivery routes, but do not define an area with boundaries. Various methods that can be used for interpolating coordinates that can produce varying levels of quality. For example, for assigning blocks, the accuracy of geocoding for assigning blocks is much better using the ZIP+4 Code rather than just the ZIP code. Not very accurate, if interested in exact location of address. Since ZIP codes can be quite large, geocoding based on the 5 digit ZIP code is much more accurate for larger geographies, such as counties and tracts than blocks. Parcel data match Some local governments have publicly available parcel datasets containing the boundaries of each parcel and the coordinates of each parcel’s centroid or the left of the structure on the parcel. Proprietary datasets also exist for many counties. Addresses can be matched to such datasets and coordinates assigned based on coordinates in the parcel data (see Cayo and Talbot 2003). Successful assignment implies a highly-precise location. Depending on the quality of the parcel data, quality of geocodes can be higher than interpolation methods. Software is more costly to license than street-range interpolation and may not have complete coverage. Quality of geocodes is dependent on coverage and accuracy of parcel data file. Method Description Advantages Disadvantages Direct coding Field representatives use a GPS device to capture the coordinates of the front door of the housing unit of the address. With the coordinates in hand, some Geographic Information System applications can reference Topographically Integrated Geographic Encoding and Referencing (TIGER) lines (or commercially enhanced versions of TIGER lines) to assign the coordinates to specific blocks. Costly to manage; GPS signals are not available everywhere, and handheld devices often have both random and systemic errors when determining coordinates (Bonner, Han, Nie, Rogerson, Vena, et al. 2003; Ward, Nuckols, Giglierano, Bonner, Wolter et al. 2005; Alkire 2010; Fiorio and Fu 2012). Clerical coding Geocodes are produced by overlaying either TIGER or TIGER-based lines or block boundary files over aerial photographs or satellite imagery to assign a block code to each housing unit. Less costly than using Field staff for GPS enumeration. Substantial labor cost of manually assigning blocks. Satellite imagery and manual processes are subject to random and systemic errors as well. Longitude/latitude assignment via interpolation Latitude and longitude coordinates can be assigned to an address by matching the address to a TIGER line segment and its corresponding range of house numbers (Cayo and Talbot 2003). Once matched, a specific coordinate can be assigned through interpolation. Generally low labor costs Errors do occur, especially in rural areas or places experiencing rapid change (Eckman and English 2012a, 2012b); software licensing may be costly. Not all addresses, especially some rural route addresses, can be accurately linked to a TIGER line. Longitude and latitude coordinates are approximate since they are approximated based on the range of house numbers on a TIGER line segment. Zip code based Address is geocoded based on the nine digit ZIP Code (ZIP and ZIP+4) or some fraction of the ZIP Code. Simplest and least costly method of geocoding. Since all addresses have a ZIP code, usually addresses that cannot be geocoded using other methods can be geocoded using this method. Various methods for interpolating coordinates from ZIP Codes giving end users flexibility to make best use of available data. For example, all addresses with the same ZIP Code can be assigned the coordinates at the geographic left of the ZIP Code Tabulation Area ZCTA (Goldberg et al. 2007). ZIP Codes represent a collection of delivery routes, but do not define an area with boundaries. Various methods that can be used for interpolating coordinates that can produce varying levels of quality. For example, for assigning blocks, the accuracy of geocoding for assigning blocks is much better using the ZIP+4 Code rather than just the ZIP code. Not very accurate, if interested in exact location of address. Since ZIP codes can be quite large, geocoding based on the 5 digit ZIP code is much more accurate for larger geographies, such as counties and tracts than blocks. Parcel data match Some local governments have publicly available parcel datasets containing the boundaries of each parcel and the coordinates of each parcel’s centroid or the left of the structure on the parcel. Proprietary datasets also exist for many counties. Addresses can be matched to such datasets and coordinates assigned based on coordinates in the parcel data (see Cayo and Talbot 2003). Successful assignment implies a highly-precise location. Depending on the quality of the parcel data, quality of geocodes can be higher than interpolation methods. Software is more costly to license than street-range interpolation and may not have complete coverage. Quality of geocodes is dependent on coverage and accuracy of parcel data file. 4.1.2 Matching addresses to auxiliary sources Simply put, “matching” an address file to a supplementary data sources involves linking the two files, usually on a common set of variables such as the address or geographic identifier. Once an address is geocoded, the resulting coordinates can be used in conjunction with geographical boundary files to link the address with its associated state, county, tract, block group, and block. With this correspondence, thousands of summary variables from the American Community Survey (ACS) or the decennial census describing the specific block, tract and county can be matched to each address (U.S. Census Bureau 2014). “Unit-level” variables, which are those associated with the household or individuals resident at a specific address, are typically linked using either the geo-coordinates (i.e., latitude and longitude) of the address or the address itself. The formatting of the address is critical when the address is used as the linking variable, especially in cases where there could be multiple units associated with the multiunit placeholder (e.g., does the auxiliary variable refer to the household residing in Apt A or Apt B?). When matching summary level information there is typically a one-to-one correspondence between the address and a given level of geography—that is—a given address will fall within a unique census block group, block or tract. As such the matched auxiliary information will be unique up to the source of that auxiliary information (i.e., the Census will provide unique summary information for the specific block group to which an address belongs). However, matching an address to unit-level auxiliary variables (from a given source) could result in one-to-many matches, meaning multiple records indicated for a given address. For example, if the auxiliary information relates to education or some other personal characteristic, the match for a single address could result in different levels of education, corresponding to different resident adults. Some vendors will provide the first result of this type of match while others can or will return multiple results. It’s important to understand how the vendor stores and returns the results of a potential one-to-many match scenario – especially if the auxiliary information is being use to boost recruitment of specific subpopulations. For example, if the vendor only includes the first record for education and education levels are ordered it will be difficult to use this auxiliary information to locate Ph.D. recipients within households having an adult with a Bachelor’s degree and another adult with a Ph.D. The one-to-many match scenario could also be encountered with other types of auxiliary variables including telephone number, email address and type of automobile, among others. Some vendors use a single source for unit-level auxiliary information but many use multiple sources to maximize the number of addresses that contain appended auxiliary information within their data. In such cases it may be possible that a single address is contained within each of the multiple sources available to the vendor. When this apparent duplication happens the matching process could result in a one-to-many matching scenario for each of the respective sources of auxiliary information. The way vendors handle this multiplicity, how the results are reconciled across multiple auxiliary sources, and whether a vendor releases all the information from these multiple sources for a given address can vary. For example, if three auxiliary sources match the number of adults within a household at a given address with values of three, four and four, respectively, then it would be important to understand how information is retained or reconciled for this particular address. For example, if one were planning to use such information to determine households below the poverty line, and income was only available from the first source, one might want to use three for the number of adults in the household to produce an internally valid estimate of the poverty line indicator. These issues should be discussed in more detail with the vendor in order to adequately assess the utility of the resulting appended auxiliary information. We discuss more specific ideas for evaluating the matching results for appended variables later in Section 5. 4.2 ABS Frame Architecture: Storing Addresses and Auxiliary Information As we have illustrated thus far, vendors can differ in the methods they use to gather, verify and geocode addresses within their ABS sampling frames as well as in the ways they match auxiliary variables to addresses. Vendors can also differ in the architecture of the resulting ABS sampling frame itself. Such architecture essentially governs how the address records and auxiliary information is stored. Some vendors offer ABS frames that are comprehensive and inclusive of both addresses as well as the auxiliary information in a single database, while others use one database to store the addresses themselves and separate databases to store all of the auxiliary information. Often there are contractual reasons for not being able to store addresses within the same database or file as the auxiliary information, especially if such data are sourced from multiple providers. While not apparent in its importance for survey researchers, the architecture used to store auxiliary and address records for a given ABS frame can impact the sampling methods and options that would be applicable. In the “separate database architecture” auxiliary variables can be used for stratification, but only after a larger sample of addresses is selected from the frame of addresses. Once this initial phase sample is selected, the separate auxiliary variable databases are queried and information is then linked to the sampled addresses. The newly enhanced sample address file, containing both the address and the appended auxiliary data, can then be used in the traditional sense for stratification or other relevant sampling designs. This approach technically results in a two phase sample of addresses (as described in Kish (1965) or Cochran (1977) for example). In the “single database architecture,” the addresses and accompanying auxiliary information are stored in a single database and the resulting ABS frames can be used directly for stratification. These frames resemble more “textbook” sampling frames one might find described in Lohr (2010), for example. Researchers conducting a stratified random sample based on key auxiliary variables appended to the frame from a vendor who uses the separate database architecture will need to request summary information about the size of the initial random sample as well as the sizes of the various strata in order to correctly compute the base sampling weights. If the vendor uses a single database architecture, then the stratum population sizes should be readily available from the vendor prior to sampling. Keeping addresses separate from the auxiliary information may not allow for the most straightforward sampling designs but it does offer the potential advantage of matching auxiliary information to a larger percentage of addresses. The large consumer databases that commonly serve as the sources of rich auxiliary information typically cover 70 to 80 percent of all possible addresses on a typical ABS sampling frame, depending on the variables in question (Buskirk, Malarek, and Bareham 2014). The ability to cross reference multiple consumer databases for a given address increases the likelihood that auxiliary information can be obtained for each address in the ABS sample. 5. ASSESSING THE QUALITY OF AUXILIARY INFORMATION APPENDED TO ABS FRAMES One of the most promising aspects of address-based sampling frames for survey researchers is the ability to append a variety of information either to the frame itself or to selected samples. With the promise auxiliary information offers there are still several issues that may curtail its potential, such as match quality as well as the completeness, coverage, and accuracy of the supplemental sources. We define three specific quality indicators for auxiliary information appended to ABS frames or samples including: the match rate; the append rate or information yield; and the accuracy of the data appended. We define the match rate as the percentage of addresses from an ABS sample or frame that could have data appended when requested. The fact that an address can be matched to a supplementary source does not imply that information will be appended for any given variable. Sometimes addresses within auxiliary source databases have incomplete records leading to missing values being appended for certain variables. To distinguish between the ability of an address to be matched to an auxiliary database and whether information is actually appended for an auxiliary variable of interest we define the information yield or append rate for an auxiliary variable as the percentage of addresses for which information was actually appended for the variable among those addresses for which information could have been appended. Finally, the accuracy of the data refers to the agreement between the information that has been appended to the address and the current truth. The accuracy of a binary auxiliary variable that indicates the presence or absence of a certain feature of an address (e.g., single adult household) may be more completely expressed as the true positive (sensitivity) or true negative rates (specificity). Several factors could influence the match and yield rates as well as the accuracy of any appended auxiliary information. First, the type of variable appended (e.g., area-level, household-level, or individual-level characteristic) will affect the expected match rate and accuracy. Certain variables are easier for vendors to compile, maintain, and match than others. Second, the type of address may have an influence on match rate and accuracy. Single-family homes with city-style addresses will have higher exact match rates than addresses in multiunit buildings, drop points, or non–city-style addresses. Also, because single family homes tend to have lower turnover rates than multiunit addresses, the accuracy of the information appended may be better for single family homes. Depending on the linking address used for matching, multiunit households may suffer from consequences of a many-to-one match scenario. There may be many units that share the same basic street address and as a result, information appended for a specific unit at that basic street address may refer to the occupant of another household unit at the same location. Third, neighborhood-level characteristics can influence the accuracy and match rate of auxiliary variables. As such, areas with populations that tend to be lower income, immigrant, or highly mobile will generally have lower match rates and accuracy compared to higher income or more stable areas. In this section we explore these factors in more detail for both summary and unit-level auxiliary variables that could be appended to ABS frames. 5.1 Quality, Accuracy and Usage of Summary-Level Auxiliary Variables Appended to ABS Frames/Samples from the ACS or the U.S. Census If an address has been geocoded it can usually be matched to Census data. Thus, the match rate to Census data is essentially the same as the geocoding rate. However, there are a number of other issues to consider when matching to Census data. Errors in the geocoding process could place an address in the wrong tract or block (Dohrmann et al 2007; Kennel and Li 2009) and so while a match to Census data would may be possible, any appended information would refer to the wrong area. While statistics produced by the Census and ACS are often considered “gold standards,” ACS estimates at the block or tract level may have large standard errors and both ACS and decennial Census estimates are subject to nonsampling errors. Furthermore, decennial Census statistics reflect counts at one point in time and so their accuracy for a given area may erode if used at later time points within a given decade. Similarly, ACS block and tract estimates represent 5-year estimates, which may not reflect the current characteristics of the area. So while the estimates are accurate in terms of the geographical location they represent, the degree to which the estimates accurately describe the particular area may vary by location. Finally, because the Census data are at the area-level, their association with characteristics of the particular household at a given address may be limited (Biemer and Peytchev 2012; Biemer and Peytchev 2013). 5.2 Quality, Accuracy and Usage for Unit-Level Auxiliary Variables Appended to ABS Frames/Samples Most unit-level variables that are appended to ABS samples or frames come from vendor-specific proprietary datasets or one or more commercially available consumer databases. The content, coverage, and quality of these data sources vary considerably both across and within vendors. Credit agencies and direct mailers often amass and compile personal and household information from thousands of sources to better engage or target customers. Commercial files include information from both proprietary and public sources such as real estate transactions, property tax assessments and voter registration files. Data can be extracted from credit card purchases, magazine subscriptions, or even from warranty cards that have check boxes for socioeconomic and demographic information. While most unit-specific variables are observed from one or more of the aforementioned sources, there are specific variables that are often modeled, such as income level. Often the details or contents of such models are not known or available from the vendors, but vendors can detail which of the variables are modeled and which are not. Buskirk et al. (2014) estimated overall match and append rates for a collection of demographic and household auxiliary variables and compared the distributions of these data to national benchmarks. They reported that the match and append rates as well as the consistency of the information appended across multiple vendors varied considerably depending on the variable of interest and geography. They also reported that addresses with non-missing appended demographic information tended to be in block groups with a higher percentage of owner occupied units. Finally, city-style addresses had about 4.4 times the odds of yielding information on core demographic variable appends compared to high rises. Pasek, Jang, Cobb, Dennis, and DiSogra (2014) also examined the mismatches between the value of appended data and those collected via a survey. Other recent work has also investigated both the accuracy of appended variables as well as the completeness of unit-level appends including: DiSogra, Michael Dennis, and Fahimi (2010); Roth, Han, and Montaquila (2013); Valliant, Dever, and Kreuter (2013); and Ridenhour, McMichael, Harter, and Dever (2014). McMichael and Roe (2012) and Harter and McMichael (2013) explored the feasibility of matching both cell phone numbers and landline numbers to address frames and found that both the match rate and accuracy rate of matched cell phone numbers were substantially less than those of landline phone numbers. Generally, most phone to address matching is based on listed landline numbers that are historically more stable compared to cell phone numbers. As more households become cell phone-only their cell phone numbers may begin to appear on consumer databases and the match rates for cell phone numbers to address may improve. Amaya et al. (2010) reported that telephone match rates can vary by address type and geography. More specifically, the telephone number append rates for addresses in rural areas and P.O. Boxes can be low because gathering information for these addresses, in general, is more difficult. On the other hand, addresses associated with multiunit buildings generally had very high telephone number match rates, in part because of linking one phone number to multiple units at the same address. Valliant, Hubbard, Lee, and Chang (2014) reported that the accuracy of appended auxiliary binary indicators (or flags) can affect coverage rates when used to identify members from a specific target subpopulation. For example, higher false negative rates of a binary auxiliary indicator for a particular target population (e.g., indicator says “no” children in the home but in reality there are children in the home) can result in increases in undercoverage of that particular target population if those addresses with an indicator value of “no” are excluded from the frame or final sample. Higher false positive rates may result in lower eligibility rates after screening and recruitment if only those addresses with the indicator present are fielded. While the presence of errors among auxiliary variables appended to the frame has been noted in recent years, it is important to understand several features of the appending process itself to fully realize their value to sampling and recruitment. Often vendors have variables that refer to the household/address itself as well as information for the “head householder” or multiple adults who dwell at a given address. For auxiliary data derived from commercial sources, information being matched at the unit/address level usually refers to a reference person who resides at that address (i.e., “head householder”). For this reason, it is completely possible that information appended to an address from one proprietary source may not match that from another source for a given address because the appended information refers to different reference persons/householders who reside at the same address (Buskirk et al. 2014; Dohrmann, Buskirk, Hyon, and Montaquila 2014). If within-household selection were used to select an adult randomly from the household there is also no guarantee that the selected adult would correspond to the “reference” person relative to the proprietary source providing such information. In such cases differences in appended information for the reference person and those from a randomly selected adult may certainly occur but not imply issues with the accuracy of the appended information. To evaluate the accuracy of the appended auxiliary information in scenarios like these a household rostering approach may be needed to determine if in fact there is at least one such adult in the household with characteristics consistent with those provided by the appended variable. The frequency by which vendors or proprietary sources update their information can also vary and may impact the accuracy of that information as corroborated by survey data. Differences in the appended value and that collected in the field may be likely for time-varying covariates such as number of adults in the household, marital status, education status, and number of children in the home. If auxiliary information is to be used in the sampling or fielding protocols and that information is time sensitive, then it would be helpful to ask the vendor how often the auxiliary information is updated. If such data were used for stratification, appending the most up-to date information at the time of sample selection will be important for sample design efficiency. Thirdly, if the auxiliary information were used in the field data collection, it might be more helpful to append the information immediately prior to the survey field period rather than at the time of sample selection if data collection starts at a later time. The survey designer should consider the tradeoffs in the context of the survey’s specific goals and requirements and create strata based on the complete disposition of the appended auxiliary variables (e.g., present, absent, and all others) to improve coverage and possibly balance efficiency with accuracy when sampling specific subgroups. For example, English et al. (2014) evaluated appended auxiliary information from three vendors for identifying households with small children in specific underserved neighborhoods in Los Angeles County, CA. The authors found little variation in appended information over two time points as well as little variation among vendors. The authors do note, however, that all three were challenged by the hard-to-reach population in that yield rates were uniformly low across the three vendors. Interpreting the missing values as “no children present” implied that in practice the use of this indicator was more helpful for excluding households rather than including them. They also reported that using additional appended variables that were associated with the main eligibility auxiliary variable (e.g., “have kids in home”) increased the true positive rate for identifying households with young children. 6. SUMMARY AND CONCLUSIONS The construction of address-based sampling frames is an endeavor that can require considerable organization, effort and resources. While not ubiquitous, a common theme of ABS frames available from many vendors involves the use of services and extract files available from the USPS AMS. As with any product or service there is noted variability in offerings for ABS frames across vendors. Such variability relates to whether or not the vendors have relied on any products or services from the USPS to corroborate and update their set of addresses. Primary vendors rely on such services and are thus thought to have the most up to date and possibly complete information; the completeness and timeliness of this information, however, can vary across geography and type of address within a given vendor. There is also variability in how vendors clean, maintain and update their address lists as well as formatting addresses that are part of multiunit dwellings, for example. The richness of an ABS frame or sample is enhanced by the ability to append auxiliary information. The key to appending this information is often related to the ability to pinpoint the exact location of an address relative to other geographical areas of interest such as Census block group or tract. Geocoding addresses offers vendors a method for matching an address to such information by using the resulting geo-coordinates for the address, in addition to the utility of the geographic location data contained by the geocode. There are various ways in which an address can be geocoded and ABS frame vendors make use of many of these methods. The number and types of sources of auxiliary information that can be linked as a result of the geocoding, as well as how such information is stored relative to the addresses, can also vary across vendors. Some vendors keep addresses and auxiliary information separate while others bundle it into a single database. This storage architecture has implications for how stratified or multistage sampling can be accomplished. We also noted that not all auxiliary information is created equal and described specific metrics for evaluating the quality of such appended information. The quality of an auxiliary variable appended to an ABS sample or frame, as measured by the match rate, append rate and accuracy, can be a function of the type of address as well as its location in the U.S. We observed that accuracy issues with time varying covariates, like education or marital status, may be a function of the frequency of vendor updates, time the sample was fielded relative to these updates and whether within-household selection is used. While many of the approaches vendors use to create and maintain ABS frames are proprietary, we have attempted to provide the general approaches and highlight specifics where possible about the actual mechanisms or methods used to create, curate, maintain, update and enhance ABS frames. We have attempted to describe the components of ABS frame creation that are likely to be consistent across vendors and those where variability should be expected. As with any sampling design and survey study, there are always practical tradeoffs to consider as it relates to cost, coverage and quality. We have also indicated where such tradeoffs need to be attended to as it relates to using available auxiliary variables for both sampling designs as well as adaptive survey designs. We sincerely hope this information will enhance and enrich conversations between researchers and vendors alike as we collectively work towards surveys that properly balance costs and effort required. Footnotes 1 The quality and timeliness of these updates can vary considerably by geography. For example, one local government may provide the USPS with a list of addresses for a proposed new housing development several years before construction begins, while others who do not regulate new construction or zoning may never do so. 2 These extracts form the CDS, DSF2, No-Stat Files, Delivery Statistics Files, City-State files, 5-digit ZIP product, ZIP+4® product, Zip4Change product, ZIPMove file, enhanced Line of Travel (eLOT®) product, Congressional District Code Files, and Carrier Route Files, which vendors use to update and verify their address lists. More information on the USPS files can be found in the USPS AIS Products Technical Guide (U.S. Postal Service 2015a) and the CDS User Guide (U.S. Postal Service 2013). 3 Section 412 of Title 39 of the U.S. code prevents the USPS from disclosing specific names and addresses to any person or business, apart from the U.S. Census Bureau. 4 The USPS maintains the Locatable Address Conversion System (LACS), which provides a crosswalk for address conversions. Vendors with a CDS license receive address conversions through the LACS in owned ZIP Codes. 5 The percentage of addresses within a given county that are classified as OWGM can vary considerably, so the additional coverage obtained by including OWGM on the sampling frame will vary. REFERENCES AAPOR Task Force on The Future of U.S. General Population Telephone Survey Research ( 2017 ), “The Future of U.S. General Population Telephone Survey Research,” prepared for the AAPOR Council under the auspices of the AAPOR Standards Committee. Alexander C. H. , Wetrogan S. ( 2000 ), “Integrating the American Community Survey and the Intercensal Demographic Estimates Program,” Proceedings of the Survey Research Methods Section of the American Statistical Association, pp. 295–300. Alkire E. ( 2010 ), “Handheld Data Collection and Its Effects on Mapping,” in A Special Joint Symposium of ISPRS Technical Commission IV & AutoCarto in Conjunction with ASPRS/CaGIS 2010 Fall Specialty Conference, Orlando, FL. Amaya A. ( 2017 ), “Drop Points,” White Paper. Available at http://abs.rti.org/atlas/drops/paper. Amaya A. , LeClere F. , Fiorio L. , English N. ( 2014 ), “Improving the Utility of the DSF Address-Based Frame through Ancillary Information,” Field Methods , 26 , 70 – 86 . Google Scholar CrossRef Search ADS Amaya A. , Skalland B. , Wooten K. ( 2010 ), “What’s in a Match?” Survey Practice , 3 6 , pp 1 – 9 . Biemer P. P. , Peytchev A. ( 2012 ), “Census Geocoding for Nonresponse Bias Evaluation in Telephone Surveys: An Assessment of the Error Properties,” Public Opinion Quarterly , 76 , 432 – 452 . Google Scholar CrossRef Search ADS Biemer P. P. , Peytchev A. ( 2013 ), “Using Geocoded Census Data for Nonresponse Bias Correction: An Assessment,” Journal of Survey Statistical Methodology , 1 , 24 – 44 . Google Scholar CrossRef Search ADS Bonner M. R. , Han D. , Nie J. , Rogerson P. , Vena J. E. , Freudenheim J. ( 2003 ), “Positional Accuracy of Geocoded Addresses in Epidemiologic Research,” Epidemiology , 14 , 408 – 412 . Google Scholar PubMed Buskirk T. D. , Malarek D. , Bareham J. S. ( 2014 ), “From Flagging a Sample to Framing It: Exploring Vendor Data That Can Be Appended to ABS Samples,” Proceedings of the Survey Research Methods Section of the American Statistical Association, pp. 111–124. Cayo M. R. , Talbot T. O. ( 2003 ), “Positional Error in Automated Geocoding of Residential Addresses,” International Journal of Health Geographics , 2 , 10 . Google Scholar CrossRef Search ADS PubMed Cochran W. G. ( 1977 ), Sampling Techniques , New York : John Wiley & Sons . Dekker K. , Amaya A. , LeClere F. , English N. ( 2012 ), “Unpacking the DSF in an Attempt to Better Reach the Drop Point Population,” Proceedings of the Joint Statistical Meeting, Section on Survey Research Methods, pp. 4596–4604. Available at http://ww2.amstat.org/sections/srms/Proceedings/y2012/Files/305686_75228.pdf. de Leeuw E. D. ( 2005 ), “To Mix or Not to Mix Data Collection Modes in Surveys,” Journal of Official Statistics , 21 , 233 – 255 . Dillman D. A. , Smyth J. D. , Christian L. M. ( 2014 ), Internet, Phone, Mail, and Mixed-Mode Surveys: The Tailored Design Method , New York : Wiley . DiSogra C. , Michael Dennis J. , Fahimi M. ( 2010 ), “On the Quality of Ancillary Data Available for Address-Based Sampling,” Proceedings of the Survey Research Methods Section of the American Statistical Association, pp. 4174–4183. Dohrmann S. , Buskirk T. D. , Hyon A. , Montaquila J. ( 2014 ), “Address-Based Sampling Frames for Beginners,” JSM Proceedings, Survey Research Methods Section. Alexandria, VA: American Statistical Association, pp. 1009–1018. Dohrmann S. , Han D. , Mohadjer L. ( 2007 ). “Improving Coverage of Residential Address Lists in Multistage Area Samples,” Proceedings of the Survey Research Methods of the American Statistical Association, pp. 3219–3126. Dohrmann S. , Kalton G. , Montaquila J. , Good C. , Berlin M. ( 2012 ), “Using Address-Based Sampling Frames in Lieu of Traditional Listing: A New Approach,” Joint Statistical Meetings, Survey Research Methods Section, pp. 3729–3741. Eckman S. , English N. ( 2012a ), “Creating Housing Unit Frames from Address Databases: Geocoding Precision and Net Coverage Rates,” Field Methods , 24 , 399 – 408 . Google Scholar CrossRef Search ADS Eckman S. , English N. ( 2012b ), “Geocoding to Create Survey Frames,” Survey Practice , 5 4 , pp. 1 – 8 . English N. , Li Y. , Mayfield A. , Frasier A. ( 2014 ), “The Use of Targeted Lists to Enhance Sampling Efficiency in Address-Based Sample Designs: Age, Race, and Other Qualities,” in 2014 Proceedings of the American Statistical Association, Survey Research Methods [CD ROM], Alexandria, VA: American Statistical Association. Faulstich P. ( 2011 ), “Technical Aspects of the Construction, Coverage, Limitations and Future of CDS,” paper presented at the 66th Annual AAPOR Conference, Phoenix, AZ. Available at https://www.aapor.org/AAPOR_Main/media/AnnualMeetingProceedings/2011/05-14-11_1C_Faulstich.pdf. (Assessed February 2, 2018). Fiorio L. , Fu J. ( 2012 ), “Modeling Coverage Error in Address Lists Due to Geocoding Error: The Impact on Survey Operations and Sampling,” Joint Statistical Meetings, Survey Research Methods Section, pp. 5588–5596. Goldberg D. W. , Wilson J. P. , Knoblock C. A. ( 2007 ), “From Text to Geographic Coordinates: The Current State of Geocoding,” URISA Journal , 19 , 33 – 46 . Harter R. , Battaglia M. P. , Buskirk T. D. , Dillman D. A. , English N. , Fahimi M. , Frankel M. R. , Kennel T. , McMichael J. P. , McPhee C. B. , DeMatteis J. M. , Yancey T. , Zukerberg A. L. ( 2016 ), “Address-Based Sampling,” Prepared for AAPOR Council by the Task Force on Address-based sampling, Operating Under the Auspices of the AAPOR Standards Committee. Oakbrook Terrace, Il. Available at http://www.aapor.org/getattachment/Education-Resources/Reports/AAPOR_Report_1_7_16_CLEAN-COPY-FINAL-(2).pdf.aspx. (Accessed Feburary 2, 2018). p. 140. Harter R. , McMichael J. P. ( 2013 ), “Scope and Coverage of Landline and Cell Phone Numbers Appended to Address Frames,” JSM Proceedings, Survey Research Methods Section. Alexandria, VA: American Statistical Association, pp. 3651–3665. Iannacchione V. G. ( 2011 ), “The Changing Role of Address-Based Sampling in Survey Research,” Public Opinion Quarterly , 75 , 556 – 575 . Google Scholar CrossRef Search ADS Kalton G. , Kali J. , Sigman R. ( 2014 ), “Handling Frame Problems When Address-Based Sampling is Used for In-Person Household Surveys,” Journal of Survey Statistics and Methodology , 2 , 283–222. Kennel T. ( 2012 ), “Evaluation of the Delivery Sequence File as a Survey Frame,” edited by Ruth Ann Killion to Nancy Potok: Internal Census Bureau Memo. Kennel T. L. , Li M. ( 2009 ), “Content and Coverage Quality of a Commercial Address List as a National Sampling Frame for Household Surveys,” in Proceeding of the Joint Statistical Meetings. Kish L. ( 1965 ), Survey Sampling , New York : John Wiley & Sons . Link M. W. , Battaglia M. P. , Frankel M. R. , Osborn L. , Mokdad A. H. ( 2008 ), “A Comparison of Address-Based Sampling (ABS) versus Random-Digit Dialing (RDD) for General Population Surveys,” Public Opinion Quarterly , 72 , 6 – 27 . Google Scholar CrossRef Search ADS Lohr S. L. ( 2010 ), Sampling: Design and Analysis , Boston : Brooks/Cole . McMichael J. P. , Roe D. ( 2012 ), “ABS and Cell Phones: Appending Both Cell Phone and Landline Phone Numbers to an Address-Based Sampling Frame,” in American Association for Public Opinion Research (AAPOR) Annual Conference, Orlando, FL. McMichael J. , Rachel Harter, Bonnie Shook-Sa, Vincent Iannacchione, Jamie Ridenhour, and Kibri Hutchison-Everett. ( 2012 ), “Sub-National Coverage Profile of U.S. Housing Units Using an Address-Based Sampling Frame,” paper presented at the 67th Annual AAPOR Conference, Orlando, FL. Available at http://www.aapor.org/AAPOR_Main/media/AnnualMeetingProceedings/2012/04_McMichael_AAPOR.pdf. (Accessed Feburary 2, 2018). Olson K. , Buskirk T. D. ( 2015 ), “Can I Get Your Phone Number? Examining the Relationship between Household, Geographic and Census-Related Variables and Phone Append Propensity for ABS Samples,” in 70th Annual AAPOR Conference, Hollywood, FL. O’Muircheartaigh C. , English N. , Eckman S. , Upchurch H. , Garcia Lopez E. , Lepkowski J. ( 2006 ), “Validating a Sampling Revolution: Benchmarking Address Lists Against Traditional Field Listing,” 2006 Proceedings of the American Statistical Association, AAPOR Survey Research Methods Section [CD ROM], Alexandria, VA: American Statistical Association. Pasek J. , Jang S. M. , Cobb C. L. , Dennis J. M. , DiSogra C. ( 2014 ), “Can Marketing Data Aid Survey Research? Examining Accuracy and Completeness in Consumer-File Data,” Public Opinion Quarterly , 78 , 889 – 916 . Google Scholar CrossRef Search ADS Ridenhour J. L. , McMichael J. P. , Harter R. , Dever J. A. ( 2014 ), “ABS and Demographic Flags: Examining the Implications for Using Auxiliary Frame Information,” in Joint Statistical Meetings, Boston, MA. Roth S. B. , Han D. , Montaquila J. M. ( 2013 ), “The ABS Frame: Quality and Considerations,” Survey Practice , 6 , 3779 – 3793 . Shook-Sa B. E. , Currivan D. B. , McMichael J. P. , Iannacchione V. G. ( 2013 ), “Extending the Coverage of Address-Based Sampling Frames beyond the USPS Computerized Delivery Sequence File,” Public Opinion Quarterly , 77 , 994 – 1005 . Google Scholar CrossRef Search ADS U.S. Census Bureau ( 2014 ), “2013 American Community Survey and Puerto Rico Community Survey 2014 Subject Definitions,” Available at http://www2.census.gov/programs-surveys/acs/tech_docs/subject_definitions/2014_ACSSubjectDefinitions.pdf. (Accessed Feburary 2, 2018). U.S. Postal Service ( 2013 ), “CDS User Guide,” Available at http://ribbs.usps.gov/cds/documents/tech_guides/CDS_USER_GUIDE.PDF. (Accessed Feburary 2, 2018). U.S. Postal Service ( 2014 ), “2014 Annual Report to Congress,” Available at https://about.usps.com/publications/annual-report-comprehensive-statement-2014/annual-report-comprehensive-statement-v2-2014.pdf. (Accessed Feburary 2, 2018). U.S. Postal Service ( 2015a ), “Address Information System Products Technical Guide,” Avaialble at https://ribbs.usps.gov/addressing/documents/tech_guides/pubs/AIS.PDF. (Accessed Feburary 2, 2018). U.S. Postal Service ( 2015b ), “DSF2® License Agreement. Version 19,” Avaialble at https://ribbs.usps.gov/dsf2/documents/tech_guides/DSF2LICA.PDF. (Accessed Feburary 2, 2018). Valliant R. , Dever J. A. , Kreuter F. ( 2013 ), Practical Tools for Designing and Weighting Survey Samples , New York : Springer . Google Scholar CrossRef Search ADS Valliant R. , Hubbard F. , Lee S. , Chang C. ( 2014 ), “Efficient Use of Commercial Lists in U.S. household Sampling,” Journal of Survey Statistics and Methodology , 2 , 182 – 209 . Google Scholar CrossRef Search ADS PubMed Ward M. H. , Nuckols J. R. , Giglierano J. , Bonner M. R. , Wolter C. F. , Airola M. , Mix W. , Colt J. S. , Hartge P. ( 2005 ), “Positional Accuracy of Two Methods of Geocoding,” Epidemiology , 16 , 542 – 547 . Google Scholar CrossRef Search ADS PubMed © The Author(s) 2018. Published by Oxford University Press on behalf of the American Association for Public Opinion Research. All rights reserved. For Permissions, please email: journals.permissions@oup.com http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Survey Statistics and Methodology Oxford University Press

The Construction, Maintenance, and Enhancement of Address-Based Sampling Frames

Loading next page...
 
/lp/ou_press/the-construction-maintenance-and-enhancement-of-address-based-sampling-xJVItPuMB7
Publisher
Oxford University Press
Copyright
© The Author(s) 2018. Published by Oxford University Press on behalf of the American Association for Public Opinion Research. All rights reserved. For Permissions, please email: journals.permissions@oup.com
ISSN
2325-0984
eISSN
2325-0992
D.O.I.
10.1093/jssam/smy003
Publisher site
See Article on Publisher Site

Abstract

Abstract Frames of residential mailing addresses based on U.S. Postal Service (USPS) sources are often used for selecting samples of housing units for surveys to be conducted in-person, by mail, by web, or using mixed modes. Address lists designed for mail delivery require some modifications for sampling purposes, which therefore require familiarity with aspects of the address files themselves. This paper takes a detailed look at the address components of these files along with the vendors from which such frames and samples are available. More specifically, this paper describes the types of address records and the components of an address record that are available. It discusses how vendors differ in both the services they provide with respect to sampling frames as well as in how those sampling frames are updated and maintained. The paper also details ways in which address-based frames can be enhanced with auxiliary data such as geocodes, area-level demographic variables, and commercial indicators at the individual address level. The match rate, append rate, accuracy, and utility of auxiliary variables are described, along with potential uses in survey design and estimation. Information contained within the paper will be useful for survey designers who have an interest in tailoring address lists for a given survey’s target population and budget. 1. GENERAL OVERVIEW AND MOTIVATION The use of address-based (ABS) sample frames has become more common for surveys with in-person, telephone, mail, and web components; examples of a few such studies include the National Health Interview Survey (NHIS), the General Social Survey (GSS), the National Household Education Survey (NHES), and the Residential Energy Consumption Survey. At the same time, the construction, maintenance, and enhancement of ABS frames can have major impacts on the cost, quality, and field operations of surveys. According to the 2017 American Association for Public Opinion Research (AAPOR) Task Force Report on the Future of Telephone Survey Research, many telephone samples in the future will likely rely only on the cell phone frame. Declining response rates for telephone surveys overall, however, coupled with legislation that affects how cell phone numbers can be dialed may continue to have adverse effects on costs for Random Digit Dial (RDD) surveys (AAPOR Task Force on The Future of U.S. General Population Telephone Survey Research 2017). In contrast, studies that use multiple modes to contact respondents are likely to persist as one way to maximize coverage and quality within realistic cost constraints. One common application involves using an address-based sample (Link, Battaglia, Frankel, Osborn, and Mokdad 2008) to contact selected households by mail coupled with using telephone interviewers to persuade those who remain nonrespondents (Alexander and Wetrogan 2000; de Leeuw 2005; AAPOR Task Force on The Future of U.S. General Population Telephone Survey Research 2017). Another example is the so called “push to web” methodology that received the 2017 Warren J. Mitofsky Innovators Award from AAPOR, which combines address-based samples with online data collection (Dillman, Smyth, and Christian 2014). Besides being a key component of mixed mode methods, address-based sampling can stand alone affording researchers flexibility with many options to target specific subpopulations or geographies with appended geocodes, phone numbers, demographics, and other data to the address frame. Although the appending process as well as any variables themselves are not without error, one can incorporate any assumed error into the design process for more cost-efficient studies. While not ubiquitous, the typical basis of ABS sampling frames in the United States is derived from the United States Postal Service’s (USPS) Address Management System (AMS). The AMS is a large database of over 170 million addresses, protected by law and maintained by the USPS for mail sorting and sequencing. Little is publicly known about the methods used to update and clean the AMS, but essentially the file is maintained and updated based on contributions from mail carriers about addresses on their routes, as well as updates provided to the USPS from local governments, post offices, and some vendors.1 While the USPS uses the AMS to ensure both the quality and completeness of mail services, it was neither created nor intended for use as a sampling frame. For this reason, vendors and survey researchers must transform the AMS for sampling. Some of the approaches vendors take to create a frame are known at a high level, but many steps employ proprietary business rules that are opaque to researchers or end users. Such approaches vary across vendors and can influence both survey coverage errors, field processing errors, and survey costs. For example, how a vendor de-duplicates and filters a list for erroneous inclusions can directly influence survey cost, quality, and coverage. A frame with many out-of-scope units may result in fewer completed interviews than expected, which in turn could introduce larger variances and costs per completed interview. The AAPOR Executive Council operating through the AAPOR Standards Committee, convened a task force to prepare a comprehensive report on address-based sampling (see Harter, Battaglia, Buskirk, Dillman, English, et al. (2016)). Task force members included Rachel Harter, Task Force Chair, Michael P. Battaglia, Trent D. Buskirk, Don A. Dillman, Ned English, Mansour Fahimi, Martin R. Frankel, Timothy Kennel, Joseph P. McMichael, Cameron Brook McPhee, Jill Montaquila DeMatteis, Tracie Yancey, and Andrew L. Zukerberg. This paper is drawn from that AAPOR Task Force Report and focuses specifically on illuminating the content and processes used by vendors to generate and maintain ABS frames. More specifically, in this paper we discuss the development of address-based frames from the “top down.” First, we focus on the types of vendors that provide ABS frames and samples, related in part to how such vendors manage their own databases of addresses. We then discuss the actual contents of the address records both from the USPS as well as any additional fields vendors create and add to these records. We conclude the article with a discussion of the types and sources of auxiliary variables that can be appended to address records within ABS frames. In the final section we also focus attention on evaluating the quality of the information appended to ABS frames and samples. We hope that this article can serve as detailed background for researchers and practitioners of ABS surveys as they become more common. 2. TYPES OF VENDORS OFFERING ABS SAMPLING FRAMES AND SAMPLES Sometimes called information resellers, data brokers, or direct mailers, vendors sell ABS frames and samples. Vendors may differ in the source of their addresses and in the geographies they cover as well as in the services they provide. For example, some vendors provide only samples while some provide only address lists, and others provide both. Some vendors can append auxiliary data such as geocodes, phone numbers, and person-level data, while others focus on providing only addresses. Some vendors have national address lists, while others focus on specific geographic areas. Another important distinction involves the nature of the relationship between the vendors and the United States Postal Service defined in part by the type of USPS product and license the vendors use to build, confirm and update their database of addresses. The USPS creates several extract files through the Address Information System (AIS) from the AMS database2 using proprietary business rules to remove some units. The most common extract file is the Delivery Sequence File (DSF) which contains every valid postal address in the United States and is used for standardizing mailing addresses for improved deliverability (O’Muircheartaigh, English, Eckman, Upchurch, Garcia Lopez, et al. 2006; Iannacchione 2011). One of the most notable services based on the DSF is the DSF2 which identifies addresses currently represented in the USPS delivery file as being “active.” Vendors receive lower bulk mailing postage rates for active addresses, so many have a DSF2 license. A related service is the Computerized Delivery Sequencing (CDS). Essentially the CDS includes all of the DSF2 services in addition to frequent transaction files with new addresses and other address changes. Vendors with either the CDS or DSF2 license from the USPS are generally referred to as “primary vendors” with others being “secondary vendors.” While there is some variability on where vendors acquire addresses in their databases, all primary vendors with either a CDS or DSF2 license must be certified by the USPS and undergo a rigorous application process (U.S. Postal Service 2013, 2014). Furthermore all vendors with a CDS or DSF2 license clean, verify, and validate their address lists with information from the same source based on the extracts from the AMS (U.S. Postal Service 2013, 2014). It is also important to note that the CDS and DSF2 licenses deal solely with mailing address information and do not contain telephone numbers, email addresses, or longitude/latitude coordinates.3 Before obtaining a CDS license, vendors must already have an address file that meets minimum thresholds for the percentage of an area’s addresses covered (U.S. Postal Service 2013). To begin the process vendors initially send their address lists to the USPS which in turn sequences the vendor’s files, removes undeliverable addresses, and adds new addresses (U.S. Postal Service 2013). To qualify for receiving updates through the CDS program, a vendor must be classified as a qualified mailer meaning they must already have a list that contains between 90 percent and 110 percent of the USPS addresses in a ZIP Code; if a vendor meets this qualification for a ZIP Code, the vendor is said to “own” the ZIP Code (Dohrmann, Han, and Mohadjer 2007; Iannacchione 2011; U.S. Postal Service 2013). Note that it is possible to have more than 100 percent of the addresses in an area due to duplicates, outdated addresses that no longer exist, addresses that never existed, those erroneously geocoded into the ZIP Code, or addresses with a misclassified address group. This updating process is done separately for each ZIP Code or address group. Vendors with a CDS license do not necessarily own all ZIP Codes in the United States, and thus do not necessarily receive updates from the USPS for all US ZIP Codes (U.S. Postal Service 2013). Therefore, one vendor may qualify to receive CDS updates only for City Carrier Residences in a ZIP Code, while another vendor may qualify for all addresses in the ZIP Code. Some vendors update their address lists with the CDS for ZIP codes they own and use other address sources for ZIP codes they don’t own. And while seemingly a high standard, McMichael, et al. (2012) evaluated one popular ABS vendor and found that roughly 90 percent of population lived in tracts for which the vendor address database had between 90 percent and 110 percent coverage. While the number of vendors who have a CDS license is rather limited, primary vendors with such licenses receive the most up-to-date and complete coverage of address lists in the ZIP codes they own. Primary vendors with a CDS license can also purchase the “No Statistics” (No-Stat) File, which contains inactive addresses such as planned addresses in new housing developments, vacant addresses on rural routes, addresses on rural routes where mail is forwarded to post office boxes, and addresses in some gated communities where mail is delivered to a central point. Most addresses in the No-Stat file do not receive mail; however, one analysis found that using a portion of the No-Stat File improved rural coverage by 2.2 percent, without adding many erroneous inclusions (Shook-Sa, Currivan, McMichael, and Iannacchione 2013). In comparison to the limited set of vendors with a CDS license, the DSF2 license is available to a broader group of vendors because it relaxes the requirements of ZIP Code “ownership” under the CDS (Iannacchione 2011). More specifically, vendors with the DSF2 license send their addresses to the USPS and receive a file in return indicating if each address appears on the AMS with additional variables about the addresses (U.S. Postal Service 2015b). Unlike the updates provided to CDS licensees, change files with new addresses and other changes are not included with the DSF2 license, but the files they receive are generally complete and derived from the same source as that provided to the CDS vendors (U.S. Postal Service 2014). Although vendors licensing DSF2 do not receive update files from the USPS with new addresses, some may get updates through supplemental files not originating from the USPS. Thus, the coverage of the address file from a vendor with a DSF2 license is not necessarily inferior. In the worst case scenario, the frames from these vendors may suffer from some undercoverage of new addresses when compared to vendors with a CDS license. Moreover, vendors who have DSF2 licenses, but not CDS licenses, are not eligible to purchase the No-Stat file. By definition, while secondary vendors do not have license agreements with the USPS, their sources may be the same as the primary vendors. While the quality of these addresses may be adequate and complete in some cases, the lack of a license to the central address extract products of the AMS means that addresses provided from secondary vendors have not been corroborated, verified, or updated from the USPS directly. Survey researchers obtaining samples or using frames from secondary vendors are then encouraged to inquire about sourcing and updating of addresses within their frames so as to better understand any implications for undercoverage. Regardless, it is important to consider coverage properties of a given ABS frame for a given study regardless of the vendor involved as even primary vendors with a CDS license may not own entire targeted geographies. For example, if a study was attempting to contact Americans living in rural areas and the sample was purchased from a CDS vendor that did not own any Rural Route or Contract Delivery Service Route addresses, we would expect undercoverage of the target population. Consequently, if a vendor did not “own” a particular sub-region of the desired target population or the vendor were a secondary vendor, it would then be important to inquire about how that vendor obtains and updates addresses for the given sub-region of interest. Also, while CDS licensees obtain the most regular updates from the AMS, the frequency of frame updates from one vendor to another can vary and may impact coverage, especially if the field period falls later in the calendar year than when the ABS sample was obtained. 3. THE BUILDING BLOCK OF ABS FRAMES – THE ADDRESS RECORD While there is variability in the sourcing of the addresses contained in sampling frames across vendors, the actual format of the basic address record itself should be relatively consistent so as to comply with basic rules of the USPS. Specifically, to improve the viability of address-based samples for bulk mailing and to improve matching, sorting, and deduplication, vendors must clean their files through parsing and standardization. Parsing is the process of separating one line of an address into standard components. According to USPS standards, the full street name should be parsed into at most four components including the street direction prefix, street name, street type, and street direction suffix. The other fields for an address include the unit or parcel number, the ZIP + 4, and the state. Standardization involves comparing parsed address components to valid values and formatting constraints. For example, the state in an address might be “Michigan,” “MI,” “MI.,” “Mich,” or “Mich.”. Standardization replaces all of these values with “MI,” the standard abbreviation for Michigan, and would also correct misspellings and internal variations by aligning with USPS rules. Finally, addresses may be edited as part of the standardization process due to new ZIP codes, street names, or street numbering schemes.4 If an address-based frame was generated by a primary vendor then it is likely that a significant portion, if not all, of the addresses within the frames were corroborated and possibly updated using AMS extract products (e.g., DSF2 or CDS). The AMS provides the parsed and standardized version of the address by including four standard components or address fields: the unit or parcel number; the street name; the ZIP + 4 and the state. The AMS can also return additional associated attributes for a given address record that describe aspects of the delivery method and other associated characteristics such as dwelling activity, type of address, and the type of dwelling unit (Iannacchione 2011). The standard and additional fields that are available from the AMS for a given address are displayed in table 1. Table 1. Components Available for Each Address in the AMS Standard components Additional associated attributes Unit #/parcel Seasonal indicator Street Educational indicator ZIP + 4 Vacant indicator State Delivery mode type indicator  Residential indicator  Business indicator  Drop indicator Drop count Locatable address conversion system (LACS) indicator  No-stat(istics) indicator  Address throwback indicator (U.S. Postal Service 2015b) Standard components Additional associated attributes Unit #/parcel Seasonal indicator Street Educational indicator ZIP + 4 Vacant indicator State Delivery mode type indicator  Residential indicator  Business indicator  Drop indicator Drop count Locatable address conversion system (LACS) indicator  No-stat(istics) indicator  Address throwback indicator (U.S. Postal Service 2015b) Table 1. Components Available for Each Address in the AMS Standard components Additional associated attributes Unit #/parcel Seasonal indicator Street Educational indicator ZIP + 4 Vacant indicator State Delivery mode type indicator  Residential indicator  Business indicator  Drop indicator Drop count Locatable address conversion system (LACS) indicator  No-stat(istics) indicator  Address throwback indicator (U.S. Postal Service 2015b) Standard components Additional associated attributes Unit #/parcel Seasonal indicator Street Educational indicator ZIP + 4 Vacant indicator State Delivery mode type indicator  Residential indicator  Business indicator  Drop indicator Drop count Locatable address conversion system (LACS) indicator  No-stat(istics) indicator  Address throwback indicator (U.S. Postal Service 2015b) Specific associated attributes that may be of interest to survey researchers with respect to coverage issues include the seasonal, educational, vacant, drop and throwback indicators, with the drop indicator and corresponding drop count indicators of potential interest for sample weighting. We describe each of these key indicators in detail in table 2 providing a description along with specific considerations regarding the use of each indicator for generating ABS samples. Table 2. Description of Select Associated Attributes Available from the AMS for an Address Record that May Be of Interest to Survey Researchers AMS indicator AMS field description Considerations for ABS sampling frames and samples Vacant A flag that indicates whether or not an address is vacant or not. According to USPS guidelines, an address must be unoccupied for 90 days to be classified as vacant (Iannacchione 2011). Recent studies have found between 38 percent and 41 percent of units classified as vacant to be occupied (Amaya et al. 2014; Kalton, Kali, and Sigman 2014). A housing unit that is vacant may quickly become occupied. Given the lag time required by the USPS to declare an address vacant and the (under)coverage potential with eliminating these addresses straight-away, caution should be exercised in removing addresses classified as such. Seasonal/educational According to the USPS, the Seasonal Delivery Indicator “specifies whether a given address receives mail only during a specific season (e.g., a summer-only residence)” (U.S. Postal Service 2013, p. 25). This indicator also includes educational delivery points. In 2010, there were about 1 million seasonal delivery addresses and about 200, 000 educational delivery points in the AMS. About 38 percent of the seasonal delivery points were found to be occupied in the 2010 Census, and 40 percent of the educational delivery points were occupied (Kennel 2012). For similar concerns about the timeliness and accuracy of the Season Delivery indicator, one should exercise caution before removing addresses classified as seasonal delivery. Throwbacks Throwback addresses are city-style addresses for which mail is redirected by the USPS to a P.O. Box (Iannacchione 2011). Such households receive mail intended for either their city-style address or the accompanying P.O. Box at the P.O. Box. Households with both a city-style address and a P.O. Box in the frame, whether or not a throwback, have no linkage noted between the two addresses. Furthermore, because P.O. Boxes are leased by individuals, a housing unit may have a physical address and multiple P.O. Boxes for multiple persons living in the same housing unit. Because the risk of duplication between mailing addresses and P.O. Boxes is high, and the chance of locating a housing unit on the basis of the P.O. Box is low, P.O. Box addresses are often excluded from sampling frames. For personal visit surveys, the city-style address would be the appropriate address to use. For mail surveys, either address could be used. Drop points/drop count Drop points are mail delivery points that serve multiple households or businesses (Dekker, Amaya, LeClere, and English 2012; Shook-Sa et al. 2013; Amaya 2017). In some cases the AMS provides the drop count defined as the number of drop units are contained within a specific drop point. Less than 1 percent of all residential addresses on the CDS file are drop point addresses, but this type of addressing tends to be clustered (Dohrmann et al. 2014). More research is needed into where drop points are located, whether they actually correspond to single or multi units, and how to handle them. To reduce undercoverage, units with drop point addresses should be given a chance of selection. For in-person surveys, this can be done by interviewing all units at a sample drop point or by listing the units at the drop point and subsampling the list. AMS indicator AMS field description Considerations for ABS sampling frames and samples Vacant A flag that indicates whether or not an address is vacant or not. According to USPS guidelines, an address must be unoccupied for 90 days to be classified as vacant (Iannacchione 2011). Recent studies have found between 38 percent and 41 percent of units classified as vacant to be occupied (Amaya et al. 2014; Kalton, Kali, and Sigman 2014). A housing unit that is vacant may quickly become occupied. Given the lag time required by the USPS to declare an address vacant and the (under)coverage potential with eliminating these addresses straight-away, caution should be exercised in removing addresses classified as such. Seasonal/educational According to the USPS, the Seasonal Delivery Indicator “specifies whether a given address receives mail only during a specific season (e.g., a summer-only residence)” (U.S. Postal Service 2013, p. 25). This indicator also includes educational delivery points. In 2010, there were about 1 million seasonal delivery addresses and about 200, 000 educational delivery points in the AMS. About 38 percent of the seasonal delivery points were found to be occupied in the 2010 Census, and 40 percent of the educational delivery points were occupied (Kennel 2012). For similar concerns about the timeliness and accuracy of the Season Delivery indicator, one should exercise caution before removing addresses classified as seasonal delivery. Throwbacks Throwback addresses are city-style addresses for which mail is redirected by the USPS to a P.O. Box (Iannacchione 2011). Such households receive mail intended for either their city-style address or the accompanying P.O. Box at the P.O. Box. Households with both a city-style address and a P.O. Box in the frame, whether or not a throwback, have no linkage noted between the two addresses. Furthermore, because P.O. Boxes are leased by individuals, a housing unit may have a physical address and multiple P.O. Boxes for multiple persons living in the same housing unit. Because the risk of duplication between mailing addresses and P.O. Boxes is high, and the chance of locating a housing unit on the basis of the P.O. Box is low, P.O. Box addresses are often excluded from sampling frames. For personal visit surveys, the city-style address would be the appropriate address to use. For mail surveys, either address could be used. Drop points/drop count Drop points are mail delivery points that serve multiple households or businesses (Dekker, Amaya, LeClere, and English 2012; Shook-Sa et al. 2013; Amaya 2017). In some cases the AMS provides the drop count defined as the number of drop units are contained within a specific drop point. Less than 1 percent of all residential addresses on the CDS file are drop point addresses, but this type of addressing tends to be clustered (Dohrmann et al. 2014). More research is needed into where drop points are located, whether they actually correspond to single or multi units, and how to handle them. To reduce undercoverage, units with drop point addresses should be given a chance of selection. For in-person surveys, this can be done by interviewing all units at a sample drop point or by listing the units at the drop point and subsampling the list. Table 2. Description of Select Associated Attributes Available from the AMS for an Address Record that May Be of Interest to Survey Researchers AMS indicator AMS field description Considerations for ABS sampling frames and samples Vacant A flag that indicates whether or not an address is vacant or not. According to USPS guidelines, an address must be unoccupied for 90 days to be classified as vacant (Iannacchione 2011). Recent studies have found between 38 percent and 41 percent of units classified as vacant to be occupied (Amaya et al. 2014; Kalton, Kali, and Sigman 2014). A housing unit that is vacant may quickly become occupied. Given the lag time required by the USPS to declare an address vacant and the (under)coverage potential with eliminating these addresses straight-away, caution should be exercised in removing addresses classified as such. Seasonal/educational According to the USPS, the Seasonal Delivery Indicator “specifies whether a given address receives mail only during a specific season (e.g., a summer-only residence)” (U.S. Postal Service 2013, p. 25). This indicator also includes educational delivery points. In 2010, there were about 1 million seasonal delivery addresses and about 200, 000 educational delivery points in the AMS. About 38 percent of the seasonal delivery points were found to be occupied in the 2010 Census, and 40 percent of the educational delivery points were occupied (Kennel 2012). For similar concerns about the timeliness and accuracy of the Season Delivery indicator, one should exercise caution before removing addresses classified as seasonal delivery. Throwbacks Throwback addresses are city-style addresses for which mail is redirected by the USPS to a P.O. Box (Iannacchione 2011). Such households receive mail intended for either their city-style address or the accompanying P.O. Box at the P.O. Box. Households with both a city-style address and a P.O. Box in the frame, whether or not a throwback, have no linkage noted between the two addresses. Furthermore, because P.O. Boxes are leased by individuals, a housing unit may have a physical address and multiple P.O. Boxes for multiple persons living in the same housing unit. Because the risk of duplication between mailing addresses and P.O. Boxes is high, and the chance of locating a housing unit on the basis of the P.O. Box is low, P.O. Box addresses are often excluded from sampling frames. For personal visit surveys, the city-style address would be the appropriate address to use. For mail surveys, either address could be used. Drop points/drop count Drop points are mail delivery points that serve multiple households or businesses (Dekker, Amaya, LeClere, and English 2012; Shook-Sa et al. 2013; Amaya 2017). In some cases the AMS provides the drop count defined as the number of drop units are contained within a specific drop point. Less than 1 percent of all residential addresses on the CDS file are drop point addresses, but this type of addressing tends to be clustered (Dohrmann et al. 2014). More research is needed into where drop points are located, whether they actually correspond to single or multi units, and how to handle them. To reduce undercoverage, units with drop point addresses should be given a chance of selection. For in-person surveys, this can be done by interviewing all units at a sample drop point or by listing the units at the drop point and subsampling the list. AMS indicator AMS field description Considerations for ABS sampling frames and samples Vacant A flag that indicates whether or not an address is vacant or not. According to USPS guidelines, an address must be unoccupied for 90 days to be classified as vacant (Iannacchione 2011). Recent studies have found between 38 percent and 41 percent of units classified as vacant to be occupied (Amaya et al. 2014; Kalton, Kali, and Sigman 2014). A housing unit that is vacant may quickly become occupied. Given the lag time required by the USPS to declare an address vacant and the (under)coverage potential with eliminating these addresses straight-away, caution should be exercised in removing addresses classified as such. Seasonal/educational According to the USPS, the Seasonal Delivery Indicator “specifies whether a given address receives mail only during a specific season (e.g., a summer-only residence)” (U.S. Postal Service 2013, p. 25). This indicator also includes educational delivery points. In 2010, there were about 1 million seasonal delivery addresses and about 200, 000 educational delivery points in the AMS. About 38 percent of the seasonal delivery points were found to be occupied in the 2010 Census, and 40 percent of the educational delivery points were occupied (Kennel 2012). For similar concerns about the timeliness and accuracy of the Season Delivery indicator, one should exercise caution before removing addresses classified as seasonal delivery. Throwbacks Throwback addresses are city-style addresses for which mail is redirected by the USPS to a P.O. Box (Iannacchione 2011). Such households receive mail intended for either their city-style address or the accompanying P.O. Box at the P.O. Box. Households with both a city-style address and a P.O. Box in the frame, whether or not a throwback, have no linkage noted between the two addresses. Furthermore, because P.O. Boxes are leased by individuals, a housing unit may have a physical address and multiple P.O. Boxes for multiple persons living in the same housing unit. Because the risk of duplication between mailing addresses and P.O. Boxes is high, and the chance of locating a housing unit on the basis of the P.O. Box is low, P.O. Box addresses are often excluded from sampling frames. For personal visit surveys, the city-style address would be the appropriate address to use. For mail surveys, either address could be used. Drop points/drop count Drop points are mail delivery points that serve multiple households or businesses (Dekker, Amaya, LeClere, and English 2012; Shook-Sa et al. 2013; Amaya 2017). In some cases the AMS provides the drop count defined as the number of drop units are contained within a specific drop point. Less than 1 percent of all residential addresses on the CDS file are drop point addresses, but this type of addressing tends to be clustered (Dohrmann et al. 2014). More research is needed into where drop points are located, whether they actually correspond to single or multi units, and how to handle them. To reduce undercoverage, units with drop point addresses should be given a chance of selection. For in-person surveys, this can be done by interviewing all units at a sample drop point or by listing the units at the drop point and subsampling the list. 3.1 Additional Attributes and Record Types to Consider for Address-Based Frames In many household surveys the ultimate sampling unit is either a person or a household. Although there is usually a one-to-one correspondence between addresses and households, there are numerous exceptions where one household has multiple addresses or multiple households share the same address. One housing unit, household, or person can be associated with more than one address or one address may contain an unspecified number of residences and accommodate multiple households. To account for these scenarios within the sampling frame, vendors can rely on a combination of associated attributes from the AMS (in the case of primary vendors) as well as a collection of other record indicators that don’t originate directly from the AMS. In the following subsection we discuss some of these common indicators including P.O. Boxes, mergers, reconfigurations, and multi-unit placeholders. We also discuss implications for survey research in using these indicators to customize the sampling frame for a given study. In 2010 there were approximately 16 million residential P.O. Box addresses on the AMS. P.O. Box addresses tend to be excluded from ABS frames because they pose a number of operational complications. Namely, if both a residential address and any corresponding P.O. Boxes are included in the sampling frame duplicate potential points of contact exist for the same household in the frame and thus households with P.O. Boxes have an increased chance of inclusion in ABS samples. Certainly one could handle the multiplicity of selection for such households if they were known a priori. However, there is no available link between a residential address and any specific P.O. Boxes that are related to it, which is true whether or not there is a throwback indicator present. Since any multiplicities cannot be computed, P.O. Boxes are generally excluded from the sampling frames to avoid the unknown increase in the probability of inclusion for addresses with associated P.O. Boxes. There is one exception to the exclusion based on a P.O. Boxes indicator called “only way to get mail” (Faulstich 2011). A P.O. Box is considered “only way to get mail” (OWGM) if the corresponding residential address cannot receive mail due to the lack of curbside delivery available from the USPS. Since the OWGM status is not part of the AMS, not all vendors can identify OWGM P.O. Boxes. Keeping OWGM P.O. Boxes on the ABS sampling frame increases coverage and does not introduce multiplicity of inclusion provided that those addresses whose delivery mode indicator from the AMS signals no curbside service are excluded from the frame.5 However, if a study calls for in-person interviewing, then including these addresses and excluding the OWGM P.O. Boxes would provide better coverage and allow field staff to conduct in-person interviews as P.O. Boxes do not identify dwelling location. While not as common as P.O. Boxes, the USPS delivers mail to approximately one million active residential rural route delivery points. These rural delivery points are similar to P.O. Boxes in that there is no explicit link to an actual housing unit. And like P.O. Boxes, matching rural route addresses to phone numbers and e-mail addresses is challenging. Including rural route delivery points will improve coverage for mail mode surveys if the target population includes rural areas. However, including them in phone and in-person surveys may not improve the coverage as desired and might exacerbate other errors. Sometimes address files contain multiunit placeholders or header records for an apartment building. A multiunit placeholder is an address record for a multiunit building without a unit designation. For example, “101 Main Street” may have two apartments: Apartment 1 and Apartment 2. If the frame has an address record for “101 Main Street” without any unit designation in addition to the two addresses with unit designations, then the “101 Main Street” address is considered a multiunit placeholder. Multiunit placeholders allow mail without unit designations to be sorted and delivered to the correct building, even though clerical sorting is needed to deliver the mail within building. Also, some multiunit placeholders are used by the main business office of apartment buildings, although they may be shared by resident households as well. If the frame also contains addresses for one or both units, then the multiunit placeholder should be considered an erroneous inclusion. Otherwise, the multiunit placeholder serves as a drop point indicator and should be treated as a drop, even if it is not flagged as such. In addition to multiunit placeholders, ABS frames may contain multiple copies of the same address for other reasons. One possibility comes from some apartment buildings that have multiple undesignated units. Imagine, for example, that an apartment building has four such units. The AMS may contain four copies of the same address without any unit designation. In this case four records appear to be duplicates of the same unit but they actually represent four different apartments. In 2010 the Census Bureau detected more than 3.5 million basic street addresses with multiple undesignated units. In fact, over 95 percent of the basic street addresses containing five or more units had at least two undesignated addresses on the AMS extracts delivered to the Census Bureau (Kennel 2012). Another common reason why duplicate addresses can exist on the sampling frame is a result of mergers that happen when one household acquires an adjacent unit or when developers consolidate units so that two or more units become a single residence. For example, “101 Main Street Apartment A” and “101 Main Street Apartment B” could merge into a single unit “101 Main Street Apartment A/B.” If “101 Main Street Apartment A” and “101 Main Street Apartment B” were not deleted and “101 Main Street Apartment A/B” was not added to the address list, the merger could result in the single unit having two addresses on the ABS frame. Related to mergers are address reconfigurations which can also create duplicate records on an ABS frame. When street names, street numbers, or within-structure identifiers are changed, an address is said to be reconfigured. For example, “101 Main Street Apartment A” could be renovated and renamed “101 Main Street Apartment Oak.” If a new address were added to the ABS frame but the old address not deleted, one housing unit could receive multiple chances of selection. The presence of seasonal housing, P.O. Box addresses, mergers, reconfigured units, and multiunit placeholders can inflate costs, variance, and sometimes bias for a given study (Amaya, LeClere, Fiorio, and English 2014). Survey researchers should anticipate situations where one housing unit or person will be associated with multiple addresses and situations where one address will be associated with multiple housing units. When possible, duplicates should be removed from the sampling frame prior to selection without removing valid units. When duplicates cannot be removed prior to interviewing, survey researchers typically inflate the initial sample size to account for reduction in unique addresses caused by duplication, with residency rules imposed for households with multiple addresses discovered during field work. Moreover, the situation in which multiple households share the same address can also impact the efficiency of the ABS sample field work. A better understanding of mergers, multiunit placeholders and reconfigurations can inform sampling and weighting procedures and field protocols. Field procedures should be developed to ensure that proper selection probabilities for such households can be computed in the absence of a priori information about them on the sampling frame. 4. ENHANCING ADDRESS-BASED FRAMES AND SAMPLES USING AUXILIARY VARIABLES There is no shortage of research on the advantages of using auxiliary variables for survey sample designs and weighting adjustments. In the era of responsive survey designs auxiliary variables can leverage information to improve efficiencies of data collection. One of the primary advantages of ABS designs is the availability of a rich collection of auxiliary variables that can be included within the ABS frame or added to samples selected from it. Auxiliary variables usually provide insights at two different levels – the unit or address level and the summary or geographic level. Unit level auxiliary variables provide information about the household associated with a given address, while summary variables provide information about the neighborhood, or some other pertinent level of geography, within which the address belongs. Auxiliary variables typically exclude those variables already available from the AMS or related extract services such as those listed in table 1. The telephone number associated with the address is a very popular auxiliary variable that is often appended to address-based samples (see for example, Olson and Buskirk 2015). Vendors such as Marketing Systems Group (MSG), InfoGroup, and others compile telephone numbers from sources including residential listings and market-research databases and offer matching as a service (Kennel and Li 2009; Amaya, Skalland, and Wooten 2010). Another common type of auxiliary information exists in the form of “presence” flags indicating that an address has a given characteristic of interest (e.g., “single adult household”) versus either not having it explicitly or missing that information. Such characteristics can and have been used to improve coverage of harder to reach populations of interest. For example, English, Li, Mayfield, and Frasier (2014) used the presence of children flags appended to an ABS sample to more efficiently locate households with newborn children. Regardless of the source or type, auxiliary variables appended to ABS frames or samples also afford researchers with information about both respondents and nonrespondents alike that may be used as the basis for weighting adjustments aimed at mitigating the potential for nonresponse bias. Clearly, the effectiveness of auxiliary variables to differentiate the population in support of stratified designs or in improving the coverage for hard-to-reach populations or in discerning differences between respondents and nonrespondents is directly related to their accuracy, completeness, and the quality of the linking process that connects these variables to the actual addresses. We explore these issues in more detail in this section. 4.1 Appending Auxiliary Data to Address Records The addition of auxiliary variables to either the ABS frame itself or to resulting ABS samples is enabled by a linking variable, or common key, associated with both the address itself and the source of the auxiliary information. Addresses can be linked to external files that store information at the person, household, address, housing unit or geographic levels. Since the AMS does not contain person, household, or local geographic data about each address, the address is the logical and obvious key for linking files. However, ABS vendors may have proprietary person, household, and geographic data associated with their addresses. Furthermore other measures of geography associated with the address can also be used as the key for appending auxiliary variables. For example, any one of a myriad of Census based variables can be appended to addresses by using the corresponding block, block group, or other area FIPS geographic code. No matter what type of key is used to create the link between ABS frame records and those from auxiliary sources, the appending process for adding auxiliary information relies on a combination of geocoding and matching via the common key. We discuss these two steps, in turn, in the subsections that follow. 4.1.1 Geocoding addresses While the AMS database (which serves as the starting reference point for most ABS frames within the U.S.) and its associated extract services includes state, county and congressional district codes, it does not contain codes for lower levels of geography such as Census tract, block group, and block. To link an address to information about its block group, ABS vendors need to generate the geographically specific code via geocoding. More specifically, geocoding is the process of determining the geographic location for an address, often in longitude and latitude coordinates. Most vendors provide block group level geocodes using one of a number of software packages or secondary providers that specialize in geo-location services. Some vendors also assign the full four-digit tabulation block code as well as longitudinal and latitude coordinates. There is some variability in the specific methods that can be used to geocode addresses. Any details about how a given vendor performs geocoding should be available from that vendor or from technical documentation available for the specific software package the vendor uses. Overall, the accuracy of geocodes depends on the way addresses are matched, the interpolation algorithms implemented, and the reference datasets used (Goldberg, Wilson, and Knoblock 2007). However, matching and interpolation generally have less impact when the geocoding is used to make assignments at levels of geography that are higher than blocks. In table 3 we highlight five of the most common methods for geocoding along with some of their advantages and disadvantages. More specific details on these methods and relevant software as well as implications for survey research have been documented elsewhere including: Kennel and Li (2009); Eckman and English (2012a, 2012b); Fiorio and Fu (2012); Dohrmann, Kalton, Montaquila, Good, and Berlin (2012). High quality geocoding is often important because geocodes are often used to link to geographic auxiliary data. If geocodes are incorrectly assigned, the geographic summary variables might mischaracterize a given unit and the neighborhood around it. Table 3. Description of Different Methods for Geocoding along with Key Advantages and Disadvantages Method Description Advantages Disadvantages Direct coding Field representatives use a GPS device to capture the coordinates of the front door of the housing unit of the address. With the coordinates in hand, some Geographic Information System applications can reference Topographically Integrated Geographic Encoding and Referencing (TIGER) lines (or commercially enhanced versions of TIGER lines) to assign the coordinates to specific blocks. Costly to manage; GPS signals are not available everywhere, and handheld devices often have both random and systemic errors when determining coordinates (Bonner, Han, Nie, Rogerson, Vena, et al. 2003; Ward, Nuckols, Giglierano, Bonner, Wolter et al. 2005; Alkire 2010; Fiorio and Fu 2012). Clerical coding Geocodes are produced by overlaying either TIGER or TIGER-based lines or block boundary files over aerial photographs or satellite imagery to assign a block code to each housing unit. Less costly than using Field staff for GPS enumeration. Substantial labor cost of manually assigning blocks. Satellite imagery and manual processes are subject to random and systemic errors as well. Longitude/latitude assignment via interpolation Latitude and longitude coordinates can be assigned to an address by matching the address to a TIGER line segment and its corresponding range of house numbers (Cayo and Talbot 2003). Once matched, a specific coordinate can be assigned through interpolation. Generally low labor costs Errors do occur, especially in rural areas or places experiencing rapid change (Eckman and English 2012a, 2012b); software licensing may be costly. Not all addresses, especially some rural route addresses, can be accurately linked to a TIGER line. Longitude and latitude coordinates are approximate since they are approximated based on the range of house numbers on a TIGER line segment. Zip code based Address is geocoded based on the nine digit ZIP Code (ZIP and ZIP+4) or some fraction of the ZIP Code. Simplest and least costly method of geocoding. Since all addresses have a ZIP code, usually addresses that cannot be geocoded using other methods can be geocoded using this method. Various methods for interpolating coordinates from ZIP Codes giving end users flexibility to make best use of available data. For example, all addresses with the same ZIP Code can be assigned the coordinates at the geographic left of the ZIP Code Tabulation Area ZCTA (Goldberg et al. 2007). ZIP Codes represent a collection of delivery routes, but do not define an area with boundaries. Various methods that can be used for interpolating coordinates that can produce varying levels of quality. For example, for assigning blocks, the accuracy of geocoding for assigning blocks is much better using the ZIP+4 Code rather than just the ZIP code. Not very accurate, if interested in exact location of address. Since ZIP codes can be quite large, geocoding based on the 5 digit ZIP code is much more accurate for larger geographies, such as counties and tracts than blocks. Parcel data match Some local governments have publicly available parcel datasets containing the boundaries of each parcel and the coordinates of each parcel’s centroid or the left of the structure on the parcel. Proprietary datasets also exist for many counties. Addresses can be matched to such datasets and coordinates assigned based on coordinates in the parcel data (see Cayo and Talbot 2003). Successful assignment implies a highly-precise location. Depending on the quality of the parcel data, quality of geocodes can be higher than interpolation methods. Software is more costly to license than street-range interpolation and may not have complete coverage. Quality of geocodes is dependent on coverage and accuracy of parcel data file. Method Description Advantages Disadvantages Direct coding Field representatives use a GPS device to capture the coordinates of the front door of the housing unit of the address. With the coordinates in hand, some Geographic Information System applications can reference Topographically Integrated Geographic Encoding and Referencing (TIGER) lines (or commercially enhanced versions of TIGER lines) to assign the coordinates to specific blocks. Costly to manage; GPS signals are not available everywhere, and handheld devices often have both random and systemic errors when determining coordinates (Bonner, Han, Nie, Rogerson, Vena, et al. 2003; Ward, Nuckols, Giglierano, Bonner, Wolter et al. 2005; Alkire 2010; Fiorio and Fu 2012). Clerical coding Geocodes are produced by overlaying either TIGER or TIGER-based lines or block boundary files over aerial photographs or satellite imagery to assign a block code to each housing unit. Less costly than using Field staff for GPS enumeration. Substantial labor cost of manually assigning blocks. Satellite imagery and manual processes are subject to random and systemic errors as well. Longitude/latitude assignment via interpolation Latitude and longitude coordinates can be assigned to an address by matching the address to a TIGER line segment and its corresponding range of house numbers (Cayo and Talbot 2003). Once matched, a specific coordinate can be assigned through interpolation. Generally low labor costs Errors do occur, especially in rural areas or places experiencing rapid change (Eckman and English 2012a, 2012b); software licensing may be costly. Not all addresses, especially some rural route addresses, can be accurately linked to a TIGER line. Longitude and latitude coordinates are approximate since they are approximated based on the range of house numbers on a TIGER line segment. Zip code based Address is geocoded based on the nine digit ZIP Code (ZIP and ZIP+4) or some fraction of the ZIP Code. Simplest and least costly method of geocoding. Since all addresses have a ZIP code, usually addresses that cannot be geocoded using other methods can be geocoded using this method. Various methods for interpolating coordinates from ZIP Codes giving end users flexibility to make best use of available data. For example, all addresses with the same ZIP Code can be assigned the coordinates at the geographic left of the ZIP Code Tabulation Area ZCTA (Goldberg et al. 2007). ZIP Codes represent a collection of delivery routes, but do not define an area with boundaries. Various methods that can be used for interpolating coordinates that can produce varying levels of quality. For example, for assigning blocks, the accuracy of geocoding for assigning blocks is much better using the ZIP+4 Code rather than just the ZIP code. Not very accurate, if interested in exact location of address. Since ZIP codes can be quite large, geocoding based on the 5 digit ZIP code is much more accurate for larger geographies, such as counties and tracts than blocks. Parcel data match Some local governments have publicly available parcel datasets containing the boundaries of each parcel and the coordinates of each parcel’s centroid or the left of the structure on the parcel. Proprietary datasets also exist for many counties. Addresses can be matched to such datasets and coordinates assigned based on coordinates in the parcel data (see Cayo and Talbot 2003). Successful assignment implies a highly-precise location. Depending on the quality of the parcel data, quality of geocodes can be higher than interpolation methods. Software is more costly to license than street-range interpolation and may not have complete coverage. Quality of geocodes is dependent on coverage and accuracy of parcel data file. Table 3. Description of Different Methods for Geocoding along with Key Advantages and Disadvantages Method Description Advantages Disadvantages Direct coding Field representatives use a GPS device to capture the coordinates of the front door of the housing unit of the address. With the coordinates in hand, some Geographic Information System applications can reference Topographically Integrated Geographic Encoding and Referencing (TIGER) lines (or commercially enhanced versions of TIGER lines) to assign the coordinates to specific blocks. Costly to manage; GPS signals are not available everywhere, and handheld devices often have both random and systemic errors when determining coordinates (Bonner, Han, Nie, Rogerson, Vena, et al. 2003; Ward, Nuckols, Giglierano, Bonner, Wolter et al. 2005; Alkire 2010; Fiorio and Fu 2012). Clerical coding Geocodes are produced by overlaying either TIGER or TIGER-based lines or block boundary files over aerial photographs or satellite imagery to assign a block code to each housing unit. Less costly than using Field staff for GPS enumeration. Substantial labor cost of manually assigning blocks. Satellite imagery and manual processes are subject to random and systemic errors as well. Longitude/latitude assignment via interpolation Latitude and longitude coordinates can be assigned to an address by matching the address to a TIGER line segment and its corresponding range of house numbers (Cayo and Talbot 2003). Once matched, a specific coordinate can be assigned through interpolation. Generally low labor costs Errors do occur, especially in rural areas or places experiencing rapid change (Eckman and English 2012a, 2012b); software licensing may be costly. Not all addresses, especially some rural route addresses, can be accurately linked to a TIGER line. Longitude and latitude coordinates are approximate since they are approximated based on the range of house numbers on a TIGER line segment. Zip code based Address is geocoded based on the nine digit ZIP Code (ZIP and ZIP+4) or some fraction of the ZIP Code. Simplest and least costly method of geocoding. Since all addresses have a ZIP code, usually addresses that cannot be geocoded using other methods can be geocoded using this method. Various methods for interpolating coordinates from ZIP Codes giving end users flexibility to make best use of available data. For example, all addresses with the same ZIP Code can be assigned the coordinates at the geographic left of the ZIP Code Tabulation Area ZCTA (Goldberg et al. 2007). ZIP Codes represent a collection of delivery routes, but do not define an area with boundaries. Various methods that can be used for interpolating coordinates that can produce varying levels of quality. For example, for assigning blocks, the accuracy of geocoding for assigning blocks is much better using the ZIP+4 Code rather than just the ZIP code. Not very accurate, if interested in exact location of address. Since ZIP codes can be quite large, geocoding based on the 5 digit ZIP code is much more accurate for larger geographies, such as counties and tracts than blocks. Parcel data match Some local governments have publicly available parcel datasets containing the boundaries of each parcel and the coordinates of each parcel’s centroid or the left of the structure on the parcel. Proprietary datasets also exist for many counties. Addresses can be matched to such datasets and coordinates assigned based on coordinates in the parcel data (see Cayo and Talbot 2003). Successful assignment implies a highly-precise location. Depending on the quality of the parcel data, quality of geocodes can be higher than interpolation methods. Software is more costly to license than street-range interpolation and may not have complete coverage. Quality of geocodes is dependent on coverage and accuracy of parcel data file. Method Description Advantages Disadvantages Direct coding Field representatives use a GPS device to capture the coordinates of the front door of the housing unit of the address. With the coordinates in hand, some Geographic Information System applications can reference Topographically Integrated Geographic Encoding and Referencing (TIGER) lines (or commercially enhanced versions of TIGER lines) to assign the coordinates to specific blocks. Costly to manage; GPS signals are not available everywhere, and handheld devices often have both random and systemic errors when determining coordinates (Bonner, Han, Nie, Rogerson, Vena, et al. 2003; Ward, Nuckols, Giglierano, Bonner, Wolter et al. 2005; Alkire 2010; Fiorio and Fu 2012). Clerical coding Geocodes are produced by overlaying either TIGER or TIGER-based lines or block boundary files over aerial photographs or satellite imagery to assign a block code to each housing unit. Less costly than using Field staff for GPS enumeration. Substantial labor cost of manually assigning blocks. Satellite imagery and manual processes are subject to random and systemic errors as well. Longitude/latitude assignment via interpolation Latitude and longitude coordinates can be assigned to an address by matching the address to a TIGER line segment and its corresponding range of house numbers (Cayo and Talbot 2003). Once matched, a specific coordinate can be assigned through interpolation. Generally low labor costs Errors do occur, especially in rural areas or places experiencing rapid change (Eckman and English 2012a, 2012b); software licensing may be costly. Not all addresses, especially some rural route addresses, can be accurately linked to a TIGER line. Longitude and latitude coordinates are approximate since they are approximated based on the range of house numbers on a TIGER line segment. Zip code based Address is geocoded based on the nine digit ZIP Code (ZIP and ZIP+4) or some fraction of the ZIP Code. Simplest and least costly method of geocoding. Since all addresses have a ZIP code, usually addresses that cannot be geocoded using other methods can be geocoded using this method. Various methods for interpolating coordinates from ZIP Codes giving end users flexibility to make best use of available data. For example, all addresses with the same ZIP Code can be assigned the coordinates at the geographic left of the ZIP Code Tabulation Area ZCTA (Goldberg et al. 2007). ZIP Codes represent a collection of delivery routes, but do not define an area with boundaries. Various methods that can be used for interpolating coordinates that can produce varying levels of quality. For example, for assigning blocks, the accuracy of geocoding for assigning blocks is much better using the ZIP+4 Code rather than just the ZIP code. Not very accurate, if interested in exact location of address. Since ZIP codes can be quite large, geocoding based on the 5 digit ZIP code is much more accurate for larger geographies, such as counties and tracts than blocks. Parcel data match Some local governments have publicly available parcel datasets containing the boundaries of each parcel and the coordinates of each parcel’s centroid or the left of the structure on the parcel. Proprietary datasets also exist for many counties. Addresses can be matched to such datasets and coordinates assigned based on coordinates in the parcel data (see Cayo and Talbot 2003). Successful assignment implies a highly-precise location. Depending on the quality of the parcel data, quality of geocodes can be higher than interpolation methods. Software is more costly to license than street-range interpolation and may not have complete coverage. Quality of geocodes is dependent on coverage and accuracy of parcel data file. 4.1.2 Matching addresses to auxiliary sources Simply put, “matching” an address file to a supplementary data sources involves linking the two files, usually on a common set of variables such as the address or geographic identifier. Once an address is geocoded, the resulting coordinates can be used in conjunction with geographical boundary files to link the address with its associated state, county, tract, block group, and block. With this correspondence, thousands of summary variables from the American Community Survey (ACS) or the decennial census describing the specific block, tract and county can be matched to each address (U.S. Census Bureau 2014). “Unit-level” variables, which are those associated with the household or individuals resident at a specific address, are typically linked using either the geo-coordinates (i.e., latitude and longitude) of the address or the address itself. The formatting of the address is critical when the address is used as the linking variable, especially in cases where there could be multiple units associated with the multiunit placeholder (e.g., does the auxiliary variable refer to the household residing in Apt A or Apt B?). When matching summary level information there is typically a one-to-one correspondence between the address and a given level of geography—that is—a given address will fall within a unique census block group, block or tract. As such the matched auxiliary information will be unique up to the source of that auxiliary information (i.e., the Census will provide unique summary information for the specific block group to which an address belongs). However, matching an address to unit-level auxiliary variables (from a given source) could result in one-to-many matches, meaning multiple records indicated for a given address. For example, if the auxiliary information relates to education or some other personal characteristic, the match for a single address could result in different levels of education, corresponding to different resident adults. Some vendors will provide the first result of this type of match while others can or will return multiple results. It’s important to understand how the vendor stores and returns the results of a potential one-to-many match scenario – especially if the auxiliary information is being use to boost recruitment of specific subpopulations. For example, if the vendor only includes the first record for education and education levels are ordered it will be difficult to use this auxiliary information to locate Ph.D. recipients within households having an adult with a Bachelor’s degree and another adult with a Ph.D. The one-to-many match scenario could also be encountered with other types of auxiliary variables including telephone number, email address and type of automobile, among others. Some vendors use a single source for unit-level auxiliary information but many use multiple sources to maximize the number of addresses that contain appended auxiliary information within their data. In such cases it may be possible that a single address is contained within each of the multiple sources available to the vendor. When this apparent duplication happens the matching process could result in a one-to-many matching scenario for each of the respective sources of auxiliary information. The way vendors handle this multiplicity, how the results are reconciled across multiple auxiliary sources, and whether a vendor releases all the information from these multiple sources for a given address can vary. For example, if three auxiliary sources match the number of adults within a household at a given address with values of three, four and four, respectively, then it would be important to understand how information is retained or reconciled for this particular address. For example, if one were planning to use such information to determine households below the poverty line, and income was only available from the first source, one might want to use three for the number of adults in the household to produce an internally valid estimate of the poverty line indicator. These issues should be discussed in more detail with the vendor in order to adequately assess the utility of the resulting appended auxiliary information. We discuss more specific ideas for evaluating the matching results for appended variables later in Section 5. 4.2 ABS Frame Architecture: Storing Addresses and Auxiliary Information As we have illustrated thus far, vendors can differ in the methods they use to gather, verify and geocode addresses within their ABS sampling frames as well as in the ways they match auxiliary variables to addresses. Vendors can also differ in the architecture of the resulting ABS sampling frame itself. Such architecture essentially governs how the address records and auxiliary information is stored. Some vendors offer ABS frames that are comprehensive and inclusive of both addresses as well as the auxiliary information in a single database, while others use one database to store the addresses themselves and separate databases to store all of the auxiliary information. Often there are contractual reasons for not being able to store addresses within the same database or file as the auxiliary information, especially if such data are sourced from multiple providers. While not apparent in its importance for survey researchers, the architecture used to store auxiliary and address records for a given ABS frame can impact the sampling methods and options that would be applicable. In the “separate database architecture” auxiliary variables can be used for stratification, but only after a larger sample of addresses is selected from the frame of addresses. Once this initial phase sample is selected, the separate auxiliary variable databases are queried and information is then linked to the sampled addresses. The newly enhanced sample address file, containing both the address and the appended auxiliary data, can then be used in the traditional sense for stratification or other relevant sampling designs. This approach technically results in a two phase sample of addresses (as described in Kish (1965) or Cochran (1977) for example). In the “single database architecture,” the addresses and accompanying auxiliary information are stored in a single database and the resulting ABS frames can be used directly for stratification. These frames resemble more “textbook” sampling frames one might find described in Lohr (2010), for example. Researchers conducting a stratified random sample based on key auxiliary variables appended to the frame from a vendor who uses the separate database architecture will need to request summary information about the size of the initial random sample as well as the sizes of the various strata in order to correctly compute the base sampling weights. If the vendor uses a single database architecture, then the stratum population sizes should be readily available from the vendor prior to sampling. Keeping addresses separate from the auxiliary information may not allow for the most straightforward sampling designs but it does offer the potential advantage of matching auxiliary information to a larger percentage of addresses. The large consumer databases that commonly serve as the sources of rich auxiliary information typically cover 70 to 80 percent of all possible addresses on a typical ABS sampling frame, depending on the variables in question (Buskirk, Malarek, and Bareham 2014). The ability to cross reference multiple consumer databases for a given address increases the likelihood that auxiliary information can be obtained for each address in the ABS sample. 5. ASSESSING THE QUALITY OF AUXILIARY INFORMATION APPENDED TO ABS FRAMES One of the most promising aspects of address-based sampling frames for survey researchers is the ability to append a variety of information either to the frame itself or to selected samples. With the promise auxiliary information offers there are still several issues that may curtail its potential, such as match quality as well as the completeness, coverage, and accuracy of the supplemental sources. We define three specific quality indicators for auxiliary information appended to ABS frames or samples including: the match rate; the append rate or information yield; and the accuracy of the data appended. We define the match rate as the percentage of addresses from an ABS sample or frame that could have data appended when requested. The fact that an address can be matched to a supplementary source does not imply that information will be appended for any given variable. Sometimes addresses within auxiliary source databases have incomplete records leading to missing values being appended for certain variables. To distinguish between the ability of an address to be matched to an auxiliary database and whether information is actually appended for an auxiliary variable of interest we define the information yield or append rate for an auxiliary variable as the percentage of addresses for which information was actually appended for the variable among those addresses for which information could have been appended. Finally, the accuracy of the data refers to the agreement between the information that has been appended to the address and the current truth. The accuracy of a binary auxiliary variable that indicates the presence or absence of a certain feature of an address (e.g., single adult household) may be more completely expressed as the true positive (sensitivity) or true negative rates (specificity). Several factors could influence the match and yield rates as well as the accuracy of any appended auxiliary information. First, the type of variable appended (e.g., area-level, household-level, or individual-level characteristic) will affect the expected match rate and accuracy. Certain variables are easier for vendors to compile, maintain, and match than others. Second, the type of address may have an influence on match rate and accuracy. Single-family homes with city-style addresses will have higher exact match rates than addresses in multiunit buildings, drop points, or non–city-style addresses. Also, because single family homes tend to have lower turnover rates than multiunit addresses, the accuracy of the information appended may be better for single family homes. Depending on the linking address used for matching, multiunit households may suffer from consequences of a many-to-one match scenario. There may be many units that share the same basic street address and as a result, information appended for a specific unit at that basic street address may refer to the occupant of another household unit at the same location. Third, neighborhood-level characteristics can influence the accuracy and match rate of auxiliary variables. As such, areas with populations that tend to be lower income, immigrant, or highly mobile will generally have lower match rates and accuracy compared to higher income or more stable areas. In this section we explore these factors in more detail for both summary and unit-level auxiliary variables that could be appended to ABS frames. 5.1 Quality, Accuracy and Usage of Summary-Level Auxiliary Variables Appended to ABS Frames/Samples from the ACS or the U.S. Census If an address has been geocoded it can usually be matched to Census data. Thus, the match rate to Census data is essentially the same as the geocoding rate. However, there are a number of other issues to consider when matching to Census data. Errors in the geocoding process could place an address in the wrong tract or block (Dohrmann et al 2007; Kennel and Li 2009) and so while a match to Census data would may be possible, any appended information would refer to the wrong area. While statistics produced by the Census and ACS are often considered “gold standards,” ACS estimates at the block or tract level may have large standard errors and both ACS and decennial Census estimates are subject to nonsampling errors. Furthermore, decennial Census statistics reflect counts at one point in time and so their accuracy for a given area may erode if used at later time points within a given decade. Similarly, ACS block and tract estimates represent 5-year estimates, which may not reflect the current characteristics of the area. So while the estimates are accurate in terms of the geographical location they represent, the degree to which the estimates accurately describe the particular area may vary by location. Finally, because the Census data are at the area-level, their association with characteristics of the particular household at a given address may be limited (Biemer and Peytchev 2012; Biemer and Peytchev 2013). 5.2 Quality, Accuracy and Usage for Unit-Level Auxiliary Variables Appended to ABS Frames/Samples Most unit-level variables that are appended to ABS samples or frames come from vendor-specific proprietary datasets or one or more commercially available consumer databases. The content, coverage, and quality of these data sources vary considerably both across and within vendors. Credit agencies and direct mailers often amass and compile personal and household information from thousands of sources to better engage or target customers. Commercial files include information from both proprietary and public sources such as real estate transactions, property tax assessments and voter registration files. Data can be extracted from credit card purchases, magazine subscriptions, or even from warranty cards that have check boxes for socioeconomic and demographic information. While most unit-specific variables are observed from one or more of the aforementioned sources, there are specific variables that are often modeled, such as income level. Often the details or contents of such models are not known or available from the vendors, but vendors can detail which of the variables are modeled and which are not. Buskirk et al. (2014) estimated overall match and append rates for a collection of demographic and household auxiliary variables and compared the distributions of these data to national benchmarks. They reported that the match and append rates as well as the consistency of the information appended across multiple vendors varied considerably depending on the variable of interest and geography. They also reported that addresses with non-missing appended demographic information tended to be in block groups with a higher percentage of owner occupied units. Finally, city-style addresses had about 4.4 times the odds of yielding information on core demographic variable appends compared to high rises. Pasek, Jang, Cobb, Dennis, and DiSogra (2014) also examined the mismatches between the value of appended data and those collected via a survey. Other recent work has also investigated both the accuracy of appended variables as well as the completeness of unit-level appends including: DiSogra, Michael Dennis, and Fahimi (2010); Roth, Han, and Montaquila (2013); Valliant, Dever, and Kreuter (2013); and Ridenhour, McMichael, Harter, and Dever (2014). McMichael and Roe (2012) and Harter and McMichael (2013) explored the feasibility of matching both cell phone numbers and landline numbers to address frames and found that both the match rate and accuracy rate of matched cell phone numbers were substantially less than those of landline phone numbers. Generally, most phone to address matching is based on listed landline numbers that are historically more stable compared to cell phone numbers. As more households become cell phone-only their cell phone numbers may begin to appear on consumer databases and the match rates for cell phone numbers to address may improve. Amaya et al. (2010) reported that telephone match rates can vary by address type and geography. More specifically, the telephone number append rates for addresses in rural areas and P.O. Boxes can be low because gathering information for these addresses, in general, is more difficult. On the other hand, addresses associated with multiunit buildings generally had very high telephone number match rates, in part because of linking one phone number to multiple units at the same address. Valliant, Hubbard, Lee, and Chang (2014) reported that the accuracy of appended auxiliary binary indicators (or flags) can affect coverage rates when used to identify members from a specific target subpopulation. For example, higher false negative rates of a binary auxiliary indicator for a particular target population (e.g., indicator says “no” children in the home but in reality there are children in the home) can result in increases in undercoverage of that particular target population if those addresses with an indicator value of “no” are excluded from the frame or final sample. Higher false positive rates may result in lower eligibility rates after screening and recruitment if only those addresses with the indicator present are fielded. While the presence of errors among auxiliary variables appended to the frame has been noted in recent years, it is important to understand several features of the appending process itself to fully realize their value to sampling and recruitment. Often vendors have variables that refer to the household/address itself as well as information for the “head householder” or multiple adults who dwell at a given address. For auxiliary data derived from commercial sources, information being matched at the unit/address level usually refers to a reference person who resides at that address (i.e., “head householder”). For this reason, it is completely possible that information appended to an address from one proprietary source may not match that from another source for a given address because the appended information refers to different reference persons/householders who reside at the same address (Buskirk et al. 2014; Dohrmann, Buskirk, Hyon, and Montaquila 2014). If within-household selection were used to select an adult randomly from the household there is also no guarantee that the selected adult would correspond to the “reference” person relative to the proprietary source providing such information. In such cases differences in appended information for the reference person and those from a randomly selected adult may certainly occur but not imply issues with the accuracy of the appended information. To evaluate the accuracy of the appended auxiliary information in scenarios like these a household rostering approach may be needed to determine if in fact there is at least one such adult in the household with characteristics consistent with those provided by the appended variable. The frequency by which vendors or proprietary sources update their information can also vary and may impact the accuracy of that information as corroborated by survey data. Differences in the appended value and that collected in the field may be likely for time-varying covariates such as number of adults in the household, marital status, education status, and number of children in the home. If auxiliary information is to be used in the sampling or fielding protocols and that information is time sensitive, then it would be helpful to ask the vendor how often the auxiliary information is updated. If such data were used for stratification, appending the most up-to date information at the time of sample selection will be important for sample design efficiency. Thirdly, if the auxiliary information were used in the field data collection, it might be more helpful to append the information immediately prior to the survey field period rather than at the time of sample selection if data collection starts at a later time. The survey designer should consider the tradeoffs in the context of the survey’s specific goals and requirements and create strata based on the complete disposition of the appended auxiliary variables (e.g., present, absent, and all others) to improve coverage and possibly balance efficiency with accuracy when sampling specific subgroups. For example, English et al. (2014) evaluated appended auxiliary information from three vendors for identifying households with small children in specific underserved neighborhoods in Los Angeles County, CA. The authors found little variation in appended information over two time points as well as little variation among vendors. The authors do note, however, that all three were challenged by the hard-to-reach population in that yield rates were uniformly low across the three vendors. Interpreting the missing values as “no children present” implied that in practice the use of this indicator was more helpful for excluding households rather than including them. They also reported that using additional appended variables that were associated with the main eligibility auxiliary variable (e.g., “have kids in home”) increased the true positive rate for identifying households with young children. 6. SUMMARY AND CONCLUSIONS The construction of address-based sampling frames is an endeavor that can require considerable organization, effort and resources. While not ubiquitous, a common theme of ABS frames available from many vendors involves the use of services and extract files available from the USPS AMS. As with any product or service there is noted variability in offerings for ABS frames across vendors. Such variability relates to whether or not the vendors have relied on any products or services from the USPS to corroborate and update their set of addresses. Primary vendors rely on such services and are thus thought to have the most up to date and possibly complete information; the completeness and timeliness of this information, however, can vary across geography and type of address within a given vendor. There is also variability in how vendors clean, maintain and update their address lists as well as formatting addresses that are part of multiunit dwellings, for example. The richness of an ABS frame or sample is enhanced by the ability to append auxiliary information. The key to appending this information is often related to the ability to pinpoint the exact location of an address relative to other geographical areas of interest such as Census block group or tract. Geocoding addresses offers vendors a method for matching an address to such information by using the resulting geo-coordinates for the address, in addition to the utility of the geographic location data contained by the geocode. There are various ways in which an address can be geocoded and ABS frame vendors make use of many of these methods. The number and types of sources of auxiliary information that can be linked as a result of the geocoding, as well as how such information is stored relative to the addresses, can also vary across vendors. Some vendors keep addresses and auxiliary information separate while others bundle it into a single database. This storage architecture has implications for how stratified or multistage sampling can be accomplished. We also noted that not all auxiliary information is created equal and described specific metrics for evaluating the quality of such appended information. The quality of an auxiliary variable appended to an ABS sample or frame, as measured by the match rate, append rate and accuracy, can be a function of the type of address as well as its location in the U.S. We observed that accuracy issues with time varying covariates, like education or marital status, may be a function of the frequency of vendor updates, time the sample was fielded relative to these updates and whether within-household selection is used. While many of the approaches vendors use to create and maintain ABS frames are proprietary, we have attempted to provide the general approaches and highlight specifics where possible about the actual mechanisms or methods used to create, curate, maintain, update and enhance ABS frames. We have attempted to describe the components of ABS frame creation that are likely to be consistent across vendors and those where variability should be expected. As with any sampling design and survey study, there are always practical tradeoffs to consider as it relates to cost, coverage and quality. We have also indicated where such tradeoffs need to be attended to as it relates to using available auxiliary variables for both sampling designs as well as adaptive survey designs. We sincerely hope this information will enhance and enrich conversations between researchers and vendors alike as we collectively work towards surveys that properly balance costs and effort required. Footnotes 1 The quality and timeliness of these updates can vary considerably by geography. For example, one local government may provide the USPS with a list of addresses for a proposed new housing development several years before construction begins, while others who do not regulate new construction or zoning may never do so. 2 These extracts form the CDS, DSF2, No-Stat Files, Delivery Statistics Files, City-State files, 5-digit ZIP product, ZIP+4® product, Zip4Change product, ZIPMove file, enhanced Line of Travel (eLOT®) product, Congressional District Code Files, and Carrier Route Files, which vendors use to update and verify their address lists. More information on the USPS files can be found in the USPS AIS Products Technical Guide (U.S. Postal Service 2015a) and the CDS User Guide (U.S. Postal Service 2013). 3 Section 412 of Title 39 of the U.S. code prevents the USPS from disclosing specific names and addresses to any person or business, apart from the U.S. Census Bureau. 4 The USPS maintains the Locatable Address Conversion System (LACS), which provides a crosswalk for address conversions. Vendors with a CDS license receive address conversions through the LACS in owned ZIP Codes. 5 The percentage of addresses within a given county that are classified as OWGM can vary considerably, so the additional coverage obtained by including OWGM on the sampling frame will vary. REFERENCES AAPOR Task Force on The Future of U.S. General Population Telephone Survey Research ( 2017 ), “The Future of U.S. General Population Telephone Survey Research,” prepared for the AAPOR Council under the auspices of the AAPOR Standards Committee. Alexander C. H. , Wetrogan S. ( 2000 ), “Integrating the American Community Survey and the Intercensal Demographic Estimates Program,” Proceedings of the Survey Research Methods Section of the American Statistical Association, pp. 295–300. Alkire E. ( 2010 ), “Handheld Data Collection and Its Effects on Mapping,” in A Special Joint Symposium of ISPRS Technical Commission IV & AutoCarto in Conjunction with ASPRS/CaGIS 2010 Fall Specialty Conference, Orlando, FL. Amaya A. ( 2017 ), “Drop Points,” White Paper. Available at http://abs.rti.org/atlas/drops/paper. Amaya A. , LeClere F. , Fiorio L. , English N. ( 2014 ), “Improving the Utility of the DSF Address-Based Frame through Ancillary Information,” Field Methods , 26 , 70 – 86 . Google Scholar CrossRef Search ADS Amaya A. , Skalland B. , Wooten K. ( 2010 ), “What’s in a Match?” Survey Practice , 3 6 , pp 1 – 9 . Biemer P. P. , Peytchev A. ( 2012 ), “Census Geocoding for Nonresponse Bias Evaluation in Telephone Surveys: An Assessment of the Error Properties,” Public Opinion Quarterly , 76 , 432 – 452 . Google Scholar CrossRef Search ADS Biemer P. P. , Peytchev A. ( 2013 ), “Using Geocoded Census Data for Nonresponse Bias Correction: An Assessment,” Journal of Survey Statistical Methodology , 1 , 24 – 44 . Google Scholar CrossRef Search ADS Bonner M. R. , Han D. , Nie J. , Rogerson P. , Vena J. E. , Freudenheim J. ( 2003 ), “Positional Accuracy of Geocoded Addresses in Epidemiologic Research,” Epidemiology , 14 , 408 – 412 . Google Scholar PubMed Buskirk T. D. , Malarek D. , Bareham J. S. ( 2014 ), “From Flagging a Sample to Framing It: Exploring Vendor Data That Can Be Appended to ABS Samples,” Proceedings of the Survey Research Methods Section of the American Statistical Association, pp. 111–124. Cayo M. R. , Talbot T. O. ( 2003 ), “Positional Error in Automated Geocoding of Residential Addresses,” International Journal of Health Geographics , 2 , 10 . Google Scholar CrossRef Search ADS PubMed Cochran W. G. ( 1977 ), Sampling Techniques , New York : John Wiley & Sons . Dekker K. , Amaya A. , LeClere F. , English N. ( 2012 ), “Unpacking the DSF in an Attempt to Better Reach the Drop Point Population,” Proceedings of the Joint Statistical Meeting, Section on Survey Research Methods, pp. 4596–4604. Available at http://ww2.amstat.org/sections/srms/Proceedings/y2012/Files/305686_75228.pdf. de Leeuw E. D. ( 2005 ), “To Mix or Not to Mix Data Collection Modes in Surveys,” Journal of Official Statistics , 21 , 233 – 255 . Dillman D. A. , Smyth J. D. , Christian L. M. ( 2014 ), Internet, Phone, Mail, and Mixed-Mode Surveys: The Tailored Design Method , New York : Wiley . DiSogra C. , Michael Dennis J. , Fahimi M. ( 2010 ), “On the Quality of Ancillary Data Available for Address-Based Sampling,” Proceedings of the Survey Research Methods Section of the American Statistical Association, pp. 4174–4183. Dohrmann S. , Buskirk T. D. , Hyon A. , Montaquila J. ( 2014 ), “Address-Based Sampling Frames for Beginners,” JSM Proceedings, Survey Research Methods Section. Alexandria, VA: American Statistical Association, pp. 1009–1018. Dohrmann S. , Han D. , Mohadjer L. ( 2007 ). “Improving Coverage of Residential Address Lists in Multistage Area Samples,” Proceedings of the Survey Research Methods of the American Statistical Association, pp. 3219–3126. Dohrmann S. , Kalton G. , Montaquila J. , Good C. , Berlin M. ( 2012 ), “Using Address-Based Sampling Frames in Lieu of Traditional Listing: A New Approach,” Joint Statistical Meetings, Survey Research Methods Section, pp. 3729–3741. Eckman S. , English N. ( 2012a ), “Creating Housing Unit Frames from Address Databases: Geocoding Precision and Net Coverage Rates,” Field Methods , 24 , 399 – 408 . Google Scholar CrossRef Search ADS Eckman S. , English N. ( 2012b ), “Geocoding to Create Survey Frames,” Survey Practice , 5 4 , pp. 1 – 8 . English N. , Li Y. , Mayfield A. , Frasier A. ( 2014 ), “The Use of Targeted Lists to Enhance Sampling Efficiency in Address-Based Sample Designs: Age, Race, and Other Qualities,” in 2014 Proceedings of the American Statistical Association, Survey Research Methods [CD ROM], Alexandria, VA: American Statistical Association. Faulstich P. ( 2011 ), “Technical Aspects of the Construction, Coverage, Limitations and Future of CDS,” paper presented at the 66th Annual AAPOR Conference, Phoenix, AZ. Available at https://www.aapor.org/AAPOR_Main/media/AnnualMeetingProceedings/2011/05-14-11_1C_Faulstich.pdf. (Assessed February 2, 2018). Fiorio L. , Fu J. ( 2012 ), “Modeling Coverage Error in Address Lists Due to Geocoding Error: The Impact on Survey Operations and Sampling,” Joint Statistical Meetings, Survey Research Methods Section, pp. 5588–5596. Goldberg D. W. , Wilson J. P. , Knoblock C. A. ( 2007 ), “From Text to Geographic Coordinates: The Current State of Geocoding,” URISA Journal , 19 , 33 – 46 . Harter R. , Battaglia M. P. , Buskirk T. D. , Dillman D. A. , English N. , Fahimi M. , Frankel M. R. , Kennel T. , McMichael J. P. , McPhee C. B. , DeMatteis J. M. , Yancey T. , Zukerberg A. L. ( 2016 ), “Address-Based Sampling,” Prepared for AAPOR Council by the Task Force on Address-based sampling, Operating Under the Auspices of the AAPOR Standards Committee. Oakbrook Terrace, Il. Available at http://www.aapor.org/getattachment/Education-Resources/Reports/AAPOR_Report_1_7_16_CLEAN-COPY-FINAL-(2).pdf.aspx. (Accessed Feburary 2, 2018). p. 140. Harter R. , McMichael J. P. ( 2013 ), “Scope and Coverage of Landline and Cell Phone Numbers Appended to Address Frames,” JSM Proceedings, Survey Research Methods Section. Alexandria, VA: American Statistical Association, pp. 3651–3665. Iannacchione V. G. ( 2011 ), “The Changing Role of Address-Based Sampling in Survey Research,” Public Opinion Quarterly , 75 , 556 – 575 . Google Scholar CrossRef Search ADS Kalton G. , Kali J. , Sigman R. ( 2014 ), “Handling Frame Problems When Address-Based Sampling is Used for In-Person Household Surveys,” Journal of Survey Statistics and Methodology , 2 , 283–222. Kennel T. ( 2012 ), “Evaluation of the Delivery Sequence File as a Survey Frame,” edited by Ruth Ann Killion to Nancy Potok: Internal Census Bureau Memo. Kennel T. L. , Li M. ( 2009 ), “Content and Coverage Quality of a Commercial Address List as a National Sampling Frame for Household Surveys,” in Proceeding of the Joint Statistical Meetings. Kish L. ( 1965 ), Survey Sampling , New York : John Wiley & Sons . Link M. W. , Battaglia M. P. , Frankel M. R. , Osborn L. , Mokdad A. H. ( 2008 ), “A Comparison of Address-Based Sampling (ABS) versus Random-Digit Dialing (RDD) for General Population Surveys,” Public Opinion Quarterly , 72 , 6 – 27 . Google Scholar CrossRef Search ADS Lohr S. L. ( 2010 ), Sampling: Design and Analysis , Boston : Brooks/Cole . McMichael J. P. , Roe D. ( 2012 ), “ABS and Cell Phones: Appending Both Cell Phone and Landline Phone Numbers to an Address-Based Sampling Frame,” in American Association for Public Opinion Research (AAPOR) Annual Conference, Orlando, FL. McMichael J. , Rachel Harter, Bonnie Shook-Sa, Vincent Iannacchione, Jamie Ridenhour, and Kibri Hutchison-Everett. ( 2012 ), “Sub-National Coverage Profile of U.S. Housing Units Using an Address-Based Sampling Frame,” paper presented at the 67th Annual AAPOR Conference, Orlando, FL. Available at http://www.aapor.org/AAPOR_Main/media/AnnualMeetingProceedings/2012/04_McMichael_AAPOR.pdf. (Accessed Feburary 2, 2018). Olson K. , Buskirk T. D. ( 2015 ), “Can I Get Your Phone Number? Examining the Relationship between Household, Geographic and Census-Related Variables and Phone Append Propensity for ABS Samples,” in 70th Annual AAPOR Conference, Hollywood, FL. O’Muircheartaigh C. , English N. , Eckman S. , Upchurch H. , Garcia Lopez E. , Lepkowski J. ( 2006 ), “Validating a Sampling Revolution: Benchmarking Address Lists Against Traditional Field Listing,” 2006 Proceedings of the American Statistical Association, AAPOR Survey Research Methods Section [CD ROM], Alexandria, VA: American Statistical Association. Pasek J. , Jang S. M. , Cobb C. L. , Dennis J. M. , DiSogra C. ( 2014 ), “Can Marketing Data Aid Survey Research? Examining Accuracy and Completeness in Consumer-File Data,” Public Opinion Quarterly , 78 , 889 – 916 . Google Scholar CrossRef Search ADS Ridenhour J. L. , McMichael J. P. , Harter R. , Dever J. A. ( 2014 ), “ABS and Demographic Flags: Examining the Implications for Using Auxiliary Frame Information,” in Joint Statistical Meetings, Boston, MA. Roth S. B. , Han D. , Montaquila J. M. ( 2013 ), “The ABS Frame: Quality and Considerations,” Survey Practice , 6 , 3779 – 3793 . Shook-Sa B. E. , Currivan D. B. , McMichael J. P. , Iannacchione V. G. ( 2013 ), “Extending the Coverage of Address-Based Sampling Frames beyond the USPS Computerized Delivery Sequence File,” Public Opinion Quarterly , 77 , 994 – 1005 . Google Scholar CrossRef Search ADS U.S. Census Bureau ( 2014 ), “2013 American Community Survey and Puerto Rico Community Survey 2014 Subject Definitions,” Available at http://www2.census.gov/programs-surveys/acs/tech_docs/subject_definitions/2014_ACSSubjectDefinitions.pdf. (Accessed Feburary 2, 2018). U.S. Postal Service ( 2013 ), “CDS User Guide,” Available at http://ribbs.usps.gov/cds/documents/tech_guides/CDS_USER_GUIDE.PDF. (Accessed Feburary 2, 2018). U.S. Postal Service ( 2014 ), “2014 Annual Report to Congress,” Available at https://about.usps.com/publications/annual-report-comprehensive-statement-2014/annual-report-comprehensive-statement-v2-2014.pdf. (Accessed Feburary 2, 2018). U.S. Postal Service ( 2015a ), “Address Information System Products Technical Guide,” Avaialble at https://ribbs.usps.gov/addressing/documents/tech_guides/pubs/AIS.PDF. (Accessed Feburary 2, 2018). U.S. Postal Service ( 2015b ), “DSF2® License Agreement. Version 19,” Avaialble at https://ribbs.usps.gov/dsf2/documents/tech_guides/DSF2LICA.PDF. (Accessed Feburary 2, 2018). Valliant R. , Dever J. A. , Kreuter F. ( 2013 ), Practical Tools for Designing and Weighting Survey Samples , New York : Springer . Google Scholar CrossRef Search ADS Valliant R. , Hubbard F. , Lee S. , Chang C. ( 2014 ), “Efficient Use of Commercial Lists in U.S. household Sampling,” Journal of Survey Statistics and Methodology , 2 , 182 – 209 . Google Scholar CrossRef Search ADS PubMed Ward M. H. , Nuckols J. R. , Giglierano J. , Bonner M. R. , Wolter C. F. , Airola M. , Mix W. , Colt J. S. , Hartge P. ( 2005 ), “Positional Accuracy of Two Methods of Geocoding,” Epidemiology , 16 , 542 – 547 . Google Scholar CrossRef Search ADS PubMed © The Author(s) 2018. Published by Oxford University Press on behalf of the American Association for Public Opinion Research. All rights reserved. For Permissions, please email: journals.permissions@oup.com

Journal

Journal of Survey Statistics and MethodologyOxford University Press

Published: Feb 27, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off