>> Dr. Paulette Hasier: Hello, everyone. I am Dr. Paulette Hasier, the chief of the Geography and Map Division at the Library of Congress. And I would like to welcome you to our 2021 GIS Day, Mapping Ourselves: Geographic Information Science in the 2020 Census of the United States. The Geography and Map Division has participated in this annual celebration of all things geospatial for more than a decade. We use this event as a way to discuss the role of geographic information sciences and how it plays in exploring some of the most important issues facing us all today. Last year, we highlighted how GIS and Cartography were helping medical staff, epidemiologist and scientists try to understand the spread of COVID-19. This year, our speakers will look into another important issue, the 2020 census and the role of geographic information sciences have played not only in assembling the data, but also the impact of the data as geographers, cartographers and policymakers begin using it to analyze everything from healthcare and internet access to the congressional redistricting process across United States. I want to sincerely thank our speakers for sharing their research and experience and also thank all of you who are watching for your support of the Geography and Map Division, and the Library of Congress's digital programming. >> Deirdre Delpiaz Bishop: Greetings, everyone. My name is Deirdre Delpiaz Bishop. I'm the chief of the Census Bureau's Geography Division. Thank you so much for the invitation to be part of this special Library of Congress event today. GIS Day is a wonderful reason to come together as geospatial professionals and educators to celebrate our successes, learn from one another, and teach those outside our field about the importance of geographic information science. I'm honored to be with you today to share the Census Bureau's story about how we use geospatial information and systems to conduct the 2020 Census, a nationwide activity that resulted in a new map of the distribution and characteristics of the American public. Let's get started. As you can see here, geography provides the foundation for all data collection, processing, tabulation and dissemination activities at the US Census Bureau. For that reason, our first step in preparing for the 2020 census was building a national address list and geographic database that established where to count all of the people. We began last decade in 2013 to ensure a complete geographic foundation that could be used to conduct our enumeration activities. First, we worked with partners in tribal, federal and state and local governments to enhance the information we had collected during the 2010 census. We employed a multilayered process of address ingest, validation and update using data from authoritative sources. Our primary source of updates came from the United States Postal Service. Twice per year, they shared their address list with us, the list they use to deliver mail, the delivery sequence file. Using their delivery sequence file, we added 5.9 million new addresses to our Master Address File. Additionally, we were able to confirm that 2.4 million addresses that were new to the delivery sequence file throughout the decade already existed in our Master Address File, most likely through contributions by other authoritative sources. As part of our ongoing partnership programs, tribal, state and local governments shared nearly 107 million addresses with us. When matched to the Master Address File 99.5% of those addresses matched. This served as a critical validation of the addresses in the Master Address File. Additionally, these same partners contributed over 521,000 new addresses and 257,000 miles of roads. In 2018, the Census Bureau implemented the Local Update of Census Addresses program, or LUCA, to provide tribal, state and local governments an opportunity to review and update the Census Bureau's address list for their respective jurisdictions. Participants from over 8,300 entities provided 22 million addresses, of which 17.8 million matched, or 81%, to addresses already in our master address file. The Census Bureau added 3.4 million new addresses from this partnership nationwide. To allow tribal, state and local governments one final opportunity to submit addresses where construction was completed between March 2018, when LUCA ended, and April 1st of 2020, Census Day, the Census Bureau conducted the new construction program. Approximately 595,000 new addresses were added by 2,244 partners nationwide. This data helps demonstrate how a multilayered process provided multiple opportunities to obtain addresses from various sources, validating, updating and filling in spatial and temporal gaps along the way. Working with our partners early in the decade set us up for the next address list development activity, the address canvassing operation. The design for this 2020 census operation was much different than in the past. In preparation for the 2020 census, we knew much of this validation could be done in the office using satellite imagery in combination with housing unit counts from the master address file. To make this happen, we developed an operation called in-office address canvassing. Using about 150 technicians, we conducted a complete review of all 11.1 million blocks in the nation. The technicians validated 87% of the blocks as passive, meaning there was no difference between the number of housing units they viewed in the imagery as compared to the number of housing units in the master address file. The 87% of the blocks that were validated during in-office address canvassing represented 65% of the nation's addresses. The dark blue areas on this map represent counties that had over 70% of their blocks classified as passive or stable. The remaining 35% were sent for validation in the field in an operation appropriately called in-field address canvassing. This occurred during the summer of 2019. We hired approximately 32,000 address listers to do this work, in comparison to the 150,000 listeners from the past. 88% of the addresses or 44.1 million were validated by those address listeners as correct. The remainder were changed or deleted during the field work. Address listers added 2.7 million new addresses. But when we matched those against the Master Address File, nearly 58% of those new addresses matched, meaning they weren't really new addresses after all, and had likely been already added to the Master Address File through concurrent in-office methods. The dark green areas on the map represent counties that had over 50% of their blocks classified as active or changing, and therefore required in-field verification. This new suite of address canvassing activities in-office and in-field avoided millions in costs compared with the design of the 2010 census address canvassing operation. Upon finalization of these address list development activities, we delivered a universe of 152 million addresses that would be included in the 2020 census enumeration universe. 95% of these received an invitation to respond in the mail, the areas shown in blue on this map. 5% were hand delivered an invitation or the interview was conducted right at their door. Those are the areas shown in green and light beige. We employed the use of online interactive map viewers like the one you see here to share this information in real time with the public. As the invitations were making their way to the households, we were using tools like this one, the response outreach area mapper, to identify areas that were least likely to self-respond. Those areas are shown here at the census tract level in the darkest shades of blue. We used data from our American Community Survey to help identify the types of populations living in these areas. And then we use that information to tailor our outreach strategy. We developed a strong message and used trusted messengers to motivate people to respond. By early August of 2020, we were able to begin knocking on the doors of the addresses that did not self-respond to the 2020 census. We used a map viewer like this to publicly share our progress at the area census office level of geography. This map is all green now, of course, because by October 15th of 2020, we had successfully enumerated 99.9% of the nation's addresses and people. As we were collecting statistics from the American public, we were also working hard to finalize the boundaries of the geographic areas that would be used to tabulate and disseminate the census results. We published this information in January and February of 2021. Our geographic products now include 14.1 million unique geographic areas, representing the nation, states, counties, census tracts, block groups, census blocks, and much more. The map that you see here depicts block level census data in Oklahoma City, Oklahoma. Over the next few months, we've processed and tabulated all of the census data to these respective geographic areas, resulting in the publication of the apportionment results on April 26th of 2021. What is apportionment? And why are the results so important, you may ask. Apportionment is the process of dividing the 435 memberships or seats in the US House of Representatives among the 50 states. At the conclusion of each decennial census, the results are used to calculate the number of seats to which each state is entitled. Each of the 50 states is entitled to a minimum of one seat in the US House of Representatives. The 2020 census apportionment population includes the resident population of the 50 states, plus a count of the US military personnel and federal civilian employees living outside of the United States, plus their dependents living with them who can be allocated to a home state. The populations of the District of Columbia and Puerto Rico are not included in the apportionment population, because they do not have voting seats in the US House of Representatives. According to our latest decennial census, the total 2020 census apportionment population for the 50 States was 331,108,434 people. This represents a 7.1% increase from the 2010 census apportionment population total. Remember, this does not include the District of Columbia or Puerto Rico. The average apportionment population per congressional representative as of the 2020 census is 761,169, an increase of 50,402 from the 2010 average. This static PDF map displays the apportionment seats in the US House of Representatives based on the 2020 census and the change in seats from 2010. The number of seats in the US House of Representatives is shown in each state. As you'll see, the majority of states did not see a change in their number of seats, but there were seven House seats that shifted among 13 states, which was less shifting than we saw in the 2010 census. You can see in green that for 202 states gained seats, Texas gained two, and Colorado, Florida, Montana, North Carolina and Oregon each gained one. You can also see in purple that seven states lost seats: California, Illinois, Michigan, New York, Ohio, Pennsylvania and West Virginia all lost one. California's loss is historic, as this is the first time that that has happened in California's history. We're also using interactive map applications to disseminate the 2020 census data. This demographic data map viewer includes state, county and census tract level data from the 2020 census. The map includes data related to population, race, Hispanic origin, housing and group quarters. The map automatically switches from state data to county data to census tract data as you zoom in to more detailed scales. Since this is data we shared as part of our redistricting data release, we do include the population of the District of Columbia and Puerto Rico here, bringing the total United States population up to 331,449,281 people. The population of the District of Columbia increased by 14.6% between 2010 and 2020. Another way to explore the 2020 census data is at data.census.gov. We've redesigned our search functionality, and have included easy to access data profiles, maps and visualizations, and code specific to industry and makes. The 2020 census was a monumental activity, but incorporating the use of geographic information and systems throughout the design brought greater efficiency and effectiveness to census operations than ever before. While we worked hard to ensure a successful census, we also had a lot of fun along the way. I'd like to conclude by thanking you again for the invitation and by sharing a video. This video demonstrates the excitement of GIS professionals, as they explain their work with the Census Bureau and provide concrete examples of how we implemented and completed the 2020 census. Thank you. [ Music ] [ Music ] >> Richard Leadbeater: Good afternoon. My name is Richard Leadbeater and I am the State Government Industry Manager for a company called Esri. Esri is a geographic information system technology provider. And one of the solutions that GIS brings you into is redistricting. As you know, redistricting is currently going on. It's definitely the current event topic of the year. The interesting part about redistricting is that it's so data heavy, both data and application heavy. And Mackenzie and I will be talking today about how we can open this up. We heard from our other speakers today about data, of course, the census PL data, but other data sources that are brought to bear for redistricting. We're going to go through a rather fast example from a conceptual level, but then Mackenzie is going to jump in and just show you how the solutions can be built. They can be built pretty easy, and we'll explain this in my slides as I go along. As I said, my name is Richard Leadbeater. Mackenzie O'Brien will be filling in for the technical part and providing a demonstration for us today. Our contact information is available. If you're interested, please feel free to call us. We'd be more than happy to talk and have a dialogue about what we're talking about today. I want to provide a little background. This may be a little bit of a repeat, but there are some particulars about this history. Redistricting, even though it's part of the Constitution, technology use in redistricting has kind of a relatively short, at least in my perspective, a relatively short history. It started with big IBM systems, probably cost millions of dollars to have that system that's in that small picture in the top right. And maybe you got one or two plans out of the whole analysis. It was based off the PL data. This was the genesis of the PL files. The Census Bureau had felt both the tiger -- or the DIME, and then it evolved into the TIGER files and from these geographic files. This is where we're getting the data that's being used today. The '90s was a little bit easier. We had workstations. That Sun workstation there on the right side probably cost $60,000. The software probably is another $40,000 to not only obtain but customize. 2000 is the epitome of that disruptive technology moment. PCs were available. You could walk to a big box store, buy a PC, a competent PC, get the data off of the Census Bureau's website and do redistricting. One of my aha moments in the year 2000 was Loudoun County, Virginia. The state of Virginia had released the data and then instructed the state and the counties and cities to start doing their districting. That was late on a Sunday evening. By Tuesday evening, Tuesday evening's board meeting, they had just under a dozen plans submitted by citizens. What does that mean? Citizens had been aware enough to know that the data was released on a Sunday, had Monday to work on it, had software, had the data available to them, and they created plans for the county to consider. And that was my aha moment in that hey, it doesn't have to be hard. That set in motion a challenge that I took on. So in 2010, we had almost a consumer-level ability to do redistricting. Although, because it was still very data centric, very cumbersome. Redistricting as a constitutional process is hard, is difficult. But the point is you could deliver. 2020, we have a lot of people acting on this, and it's a little bit too early to tell, but stay tuned and we'll talk about this. Stepping half a decade back into 2005 I was asked to speak in front of the NCSL Redistricting Committee. NCSL is National Conference for State Legislature. They wanted to know about technology, because as I said, 2000 was that disruptive moment. They were kind of shell shocked, if I could use that word, about technology and what it did and how it changed the dynamic. And the small drawing on the left was my representation from the top that day of what the current system, what the state of the art was then. Basically it was big software, it was expensive software. It was shared between the professional staff, professional consultants, the legislature. The public had really no entree into the process. And this is the drawing that I drew back then at that meeting, that this was the realm of the possible. That technology was to the point where, as large as the census Pl data is, as heavy and analytical process that redistricting is, that it could be delivered through a web browser. And this would change the dynamic of the process. It could be made available, a la that Loudoun County example that I gave. It could be presented the public in an intuitive way so that they could be involved. And there's a whole bunch of reasons why. Why put it in the cloud? Why make it available to the public in there? There's savings both from an IT point of view, but from a public and openness point of view as well. Right after that 2005 meeting, this is some text out of a manifesto that I wrote that this could be possible, that redistricting could be -- the algorithms could be sewn together, the data presented so that it could provide openness and transparency, as well as citizen engagement into the redistricting process. I'm providing this image because I want to make the point that the data used in redistricting, the boundaries that are created, the districts, the legislative districts that are created from the census block data, all of that is actually used throughout the decade. Ultimately, those legislative -- from local government up to state government, up to federal government, those boundaries represent a very important data element to the election cycle. So don't think redistricting is one and done. It's a process, because once the federal and the state and the local districts are created and agreed upon, it comes back to the local election clerk. And that clerk has the duty to realign the voter registrations, realign the voters into the right districts. So this is a continual process. And one of the challenges for the next decade is going to be how to make this data useful and how to make it a live product, that it can be done in multiple ways and used for multiple things as well. When we're talking about open applications, applications that include data that are meant to be opened and available to the public, there's three pillars. The data sources, of course. When we talk about redistricting, that's the PL data. But there's other data that needs to be brought to bear. Legislative districts usually have to honor physical boundaries, social boundaries, political boundaries. All of these things are tertiary datasets that may not be in and probably are not in the PL data. Taking a river, taking a highway and if that represents a cultural or ethnic boundary, being able to use that in your analysis to create districts of likeminded voters. Plan management, in other words, the software, the ability to put together the workflows that make it easy and addressable. And then there's the collaboration. Those workflows need to be delivered to Joe and Jane Q. Public. Is it intuitive enough so that anybody can use it? My goal was to deliver this to my mother and have my mother use this. We -- sorry about that -- we talked about the PL, the public cloud data. And fundamentally, I provide this diagram because I want you to see that it's built very deliberately. It's built in a hierarchy from the census block data. And that's kind of the ultimate Lego bit, that you then take all of those bits to build your federal, your state, your local districts from. And these include block groups, census tracts, places, counties, all of these cascade up into the plans. I give you this just because this is what Mackenzie will be diving into. She'll be showing you the user interface and that's because, as I've said multiple times already, this is an important aspect of any kind of solution where openness and transparency is the goal. It has to be intuitive, it has to be able to learn without the burden of process. In 2000, the state of Utah was -- well, I'll call them my poster child. They bought into the idea of citizen engagement and transparency. And they used this application to great access. They had well over 1,000 citizens log in and make plans. Of those, 323 actually submitted plans through the criteria that the state required, through their constitution for a legitimate plan. It facilitated the media in the state -- they were quite pleased in that they could talk with citizens, engage citizens, get people to look at plans as they were being developed. There are lots of case studies right now as redistricting in the state level is going on. Here are four: Arizona, Utah, again, and the state of Florida. I'm sorry, it's three. Then the state of Florida all have active citizen portals that you can view today. With that I'm going to transfer it over to Mackenzie for a short demonstration. >> Mackenzie O'Brien: Okay, thank you so much. So here we have the front door to the redistricting solution. I'm going to be signed in automatically. I'll just resize my window real quick. In here I have the open plan dialog. So from here, I can either access any of my recent plans, a list of all of my plans, or any plans that have been shared to me. For now I'm just going to start off with a blank slate. And the first option I have is to change the nomenclature of the district. By default, it goes directly to district, but I can change it to any of these terms. Here I can adjust the number of districts. For simplicity's sake, I'll be working with two in this demo. Over here, I can adjust the maximum deviation. So that's how much the districts are allowed to deviate from each other and population. And again, for simplicity's sake, I'll leave this at 10% for now. I have the option to choose between using 2020, 2010 and 2000 data. Since this is the newest and greatest data, I'll be using the 2020 data. And I'm just going to make a state plan of Utah. Here I can choose to select two different data hierarchies, either county, tract, block to block, or county, ETD and block. And this is simply a choice of which geographies will be displayed on the map. And I'm just going to stick with that default here. Once I hit next, I can choose to change the color of my districts. So I'll just change this to purple. And on this next page, I have the option to add demographic variables that will update as I add different geographies to my district. So from here, I can either add individual variables or an entire category. And I can also add percentage variables as well. Once I hit Done, those demographic variables will update down here in what we call the district window, and I'll be prompted to save my plan. The software is based on the Microsoft paradigm. So if you've worked with any Microsoft products, this format will look very familiar to you. We've got all these tabs up here at the top. And our idea is that as long as someone moves left to right through the application, through all of these tabs, they should be able to create a completely legitimate plan. First button will allow you to create a new plan, open a plan, save the plan, save it under a different name, create an archive of a plan to be saved to a zip file, save the plan locally to your computer, and then open it back up using the software. I can go in and print the plan or create a plan book with an individual page for each district. And if I have existing boundaries that I'd like to bring in and work off of, I can use this import button here. Under the Learn tab, I can walk through each of the different links here. And this will instruct me as to how to use the software. We found that this has been very effective in teaching people how to use the software. We've had people come in and request in-person training. But then once they've gone through these steps and links, they felt more confident in their ability to use the software. Under the View tab, I can adjust the number of districts and that will update on the fly down here in this column. Over here, I can choose to show the target value so that will just give the ideal number of people in each district if they're to be perfectly balanced. I can adjust the display variables as needed. I can also choose to engage what is called deviation coloring. So this will show me quickly at a glance which five districts are above or below the maximum deviation. I can spot the locations of districts, come in and modify the districts, merge districts together. And once I pop out this district window button here, it just goes into its own screen. And I can bring it into another monitor if I have that available. And once I close that, it just goes right back to where it was before. Under the Create tab, this is where I'll actually start building out my districts on the map. So the first thing I'll do is choose to assign geographies to district one. And I can use four different drawing tools, all of which are available right here. First one I'll demonstrate is the point. I can also use the rectangle. And I'm just going to zoom in a little bit to demonstrate the polygon. Polygon. Or polyline. I can also use what's called Select by Attribute. So if I'm looking for a certain area that has, let's say, a high American Indian population, I could come in here and write out an expression where I'd be able to add all districts meeting that criteria to my district. I can also add an entire city to my district at once. I just come over here to the left-hand panel, go over to identify. And then for this, I'll just type in Sandy, looking for Sandy, Utah. And once I hit show, you see that it's highlighted on the map. And once I scroll down and hit assign to, it will pull all of the census blocks from within those boundaries and add them to my district. And you'll notice that as it did that, these numbers down here will update on the fly, so I don't have to worry about calculating those manually. Now I can also bring in some census data layers or some data layers that might be relevant to me, such as a school districts layer. I'll just come over here to the search tab and look for the Utah school districts layer. And once I scroll down and find one that looks good to me, I can just hit add. And you'll see that they populate on the map. And from here I can do a similar functionality where it will grab all of the blocks from within those boundaries just by going back over to this identify tab. Choosing to identify from that school districts layer, clicking on the school district I'm interested in on the map and hitting assign to once again. [ Silence ] And I'll just clear that highlight. I can also change different base maps. So over here I can choose from a number of different ones that are also available in ArcGIS Online. So my personal favorite is this light gray canvas. It just looks very minimalistic and clean. Some people like to use the imagery base map so they can get a bird's eye view of the neighborhoods that they're working in. Of course, you can always come back to street, which is our default and a perfectly good base map to use. Under themes, I can see the number of people in each of the census geographies. When I hit update, you'll notice that a choropleth map will appear in the background and it will show the population of each of the census geographies and that will adjust as I zoom in and zoom out to different levels. Under the review tab, I can get a demographic overview of each of my districts. So over here, once I had the plan histogram, I can see the different demographics. Oops. I accidentally switched to that. So I'm just going to redo that. Over here under the review tab, I can see the demographic overview of each of my districts. So under the plan histogram, I can get an overview of how a different demographic group is split out amongst all of my districts. Plan distributions shows similar, just in the format of a pie chart. And then the district distribution chart will show how one district is broken out over multiple demographics. Under the compactness test, I can run a number of different compactness tests approved by the DOJ. And if you need a little more of a guide to any of these, you can just hover over the name, and it shows you a brief explanation. I can also run what are called integrity tests. So these are six key checks to ensure that your plan is complete and legitimate. And if you get any errors, which I have right here, it will simply zoom you to the location of that error. And from there, I can just use my drawing tools to quickly go and repair that. And I also have one here where I've got multiple parts to my district. And once I run those again, it will ensure that I have a complete and legitimate plan now. From here, I can also create what are called map notes directly on the map. So let's say I am very concerned about a certain area and I just want to really emphasize that it should be in this particular district. I'll just make a little drawing around this particular area by using that polygon tool. And then I can also leave a note. I'll just change the color of that so it's more readable. I can also compare plans against one another. So just for example, I've made two six-district plans of Utah. So once I open that up, I will immediately be able to see the differences between the two plans by this hatch symbology over here. I can also come down to the Comparison tab and get a demographic overview of how those two plans differ from one another. Under the Share tab, I have the option to share my plan out to others and add more information to my plan. If I want to attach a file such as a Word document explaining why I've made certain decisions, I can do that over here. I can also leave a comment on my own plan. Or if I go to someone else's, I can leave a comment on theirs. I can also share my plan out to any groups that I'm a member of, or the option to go in and create a group. So once you're done with the plan, you can easily submit this to your representatives or anyone who is in charge of the redistricting process in your municipality for consideration. >> Richard Leadbeater: All right. Thank you, Mackenzie. And with that, I just want to reiterate that redistricting is one of the more data robust applications a state or local government does. It actually has quite a bit of data. It's complex constitutionally. But the point is, with technology today, these things, these barriers can be minimized with providing local government the opportunity for openness, transparency and inclusion into these often noteworthy processes. With that, I'd like to say thank you for Mackenzie and myself. Have a good afternoon. Thank you. >> Mackenzie O'Brien: Thank you. >> David Van Riper: Hi, everyone. My name is David Van Riper, and I'm the director of spatial analysis at IPUMS, which is a research center at the University of Minnesota. I want to thank the Library of Congress for the invitation to present as part of GIS Day 2021. And I'm excited to talk about the history of small area census data from print to IPUMS NHGIS, the National Historical Geographic Information System. My presentation will cover three main topics. I'll begin with a brief history of small area census data from 1790 to the present, and then discuss how users have accessed these data over that same time period. The final topic describes the work we've done during the last 20 years to build NHGIS, which provides access to these small area census data and associated GIS mapping files. I've divided the small area census data into three time periods. During the 1790 to 1900 time period, most small area data were available for three geographic entities. We had states, including this publication from the 1790 census that was published under the imprimatur of Thomas Jefferson. Counties, such as the Massachusetts counties from the 1850 decennial census shown here. And cities or towns as we see here, with a table denoting the population by race, sex and nativity for cities and towns of the 4,000 people or more in Alabama, Arizona Territory, Arkansas, and California. Things changed, though, in the 1910 to 1930 time period. And I would argue that this is a very important time period for small area census data in the United States context. In 1906, a man named Walter Laidlaw, who worked for the Federation of Churches and Christian Organizations in New York City, published a book calling for the creation of a scientific city map system. And what he wanted was for the Census Bureau or some other agency to create relatively permanent small area units that could be used to track neighborhood changes over time. And Laidlaw's idea was eventually institutionalized as the census tract. But during the 1910 to 1930 time period, Laidlaw and a man named Howard Whipple Green are responsible for evangelizing the concept of the census tract and going out and selling this idea to local municipal organizations throughout the US and getting more and more cities to request the delineation and tabulation of census tract data from the United States Census Bureau. Here we see a map from a publication that Howard Whipple Green published in 1927 that shows the change in population from 1910 to 1920 for neighborhoods in Cleveland and its principal suburbs. During the 1910 to 1930 time period, more and more local organizations were requesting that census tracts be delineated and data be tabulated and published for these. And this became so popular that in 1940, the Census Bureau actually institutionalized the concept of the census tract within the decennial census program. This leads us to the final time period with respect to small area census data, 1940 to the present. Starting in 1940, the Census Bureau took over the delineation and tabulation of census tract data. They institutionalized the rules that cities and local areas had for delineating the census tract boundaries. And just as importantly, the Census Bureau standardized the data that they would tabulate and published for all of the census tract units. Prior to 1940, each local city or municipal organization would request a different set of statistics, the ones that they felt were most important for their particular area. But that meant that the data that were published for the city of Chicago were not the same as the data that were published for the city of New York or for the city of Philadelphia. So you couldn't do true city-to-city comparative work. With the Census Bureau institutionalizing that tabulation program within its decennial census office, now we would get data that was consistent across cities within the United States. Here we see the census tract map for 1940 for the city of St. Paul, which is where I live. And you can see here, kind of a very high-quality map that shows each census tract and the city limits or the streets that make up those boundaries. In 1940, we also saw the introduction of the census block, which is the smallest geographic unit for which the Census Bureau publishes data. Here we can see a small snippet of a map for census blocks again in the city of St. Paul. And you can see here that each census block has a number on it. And again, this high-quality map shows the actual boundaries of all of the census blocks in the city of St. Paul. This was combined with some new statistics that were available at the block level. Here we see a snippet of the housing blocks statistics publication for St. Paul. Each column represents a particular statistic, occupied dwelling units, the total number of structures, the average monthly rent for each census block within the city of St. Paul. And these were first introduced in 1940 and have been published consistently for each decennial census over that last 80-year time period. Throughout the remainder of the 20th century, the Census Bureau continued to delineate and publish census tract boundaries and data. This is just the 1970 census tract map for the city of St. Paul. You can see that these are incredibly high-quality paper maps that really do a great job showing the boundaries of all of these important census units. Starting in the last part of the 20th century, the Census Bureau started tabulating and publishing data for dozens of geographic units. So we went from the states, counties, cities census tracts to, by the 1990 and 2000 decennial censuses, the bureau tabulating and publishing data for places and block groups and school districts and congressional districts in urban areas. And for those of us who work in this field, this census geographic hierarchy is something we've all come to know and love. And now today, this is really the set of units that the Census Bureau continues to tabulate and publish data for in both the decennial census publications as well as the American Community Survey. Now, that's a brief overview of the types of small areas that the Census Bureau was publishing data for. But in parallel with the types of data that were being published, how users would access the census data changed greatly over the last 230 years. So now I want to talk a little bit about how that data access has changed. And that will set me up for the final topic, talking about the work that we've done here to develop the NHGIS project, which is another data access system that provides users with a large volume of historical and contemporary census data. So we begin in the 1790 to 1950 time period, and if you were a data user at that time, you were really restricted to one access modality. You had print volumes. You would have to go to the print volumes to find the statistics that you needed for the geographic areas of interests. So we've got data from 1790, data from 1850, and data from 1940. And really up through the 1950 decennial census, all of the statistical publications were available in print form alone. From 1960 to 2000, we saw the Census Bureau use two tracks, two ways to access data. We saw the switch to both print along with digital data being more widely available, kind of during this 40-year time period. So you still had massive volumes of print, massive numbers of print volumes coming out. These are the 1970 census tract volumes that were published following that decennial census. Here are the many linear feet of 1980 census tract volumes that were published. These are all available in the basement of Wilson Library, which is our federal depository library here for the state of Minnesota. But you could go look at these volumes, you can find the statistics within them and you could also find the paper maps that delineated the census tract boundaries. Now starting in 1960 and 1970, the Census Bureau, which was an early adopter of computers for tabulating results, started publishing the statistics both in print but also on the computer summary tapes. So they would publish these summary tapes and send them out to institutions of higher education, along with other government agencies, and they had computing systems set up. Researchers could write punch cards or computer code that would read data off of these summary tapes and allow for more nuanced analysis of the data, more than you could do than just by working from by hand, transcribing the numbers from print volumes onto a piece of paper. That was really the way that the Census Bureau provided access to data from 1960 to 2000. From 2010 to the present, the Census Bureau has gone to all digital access to its data products, but they have increased the number of ways, the number of modalities with which people can access these data. So we can now get data via FTP from the other FTP sites. We can get data from web-based access systems such as data.census.gov. Or more recently, the Census Bureau has released what are called application programming interfaces. These are kind of computer programs that allow you to write code within your favorite programming language -- R or Python or Java or JavaScript -- and access data programmatically within your software code. And the Census Bureau has been developing these over the last 10 years to provide more flexible ways to access their data. But they've gone away from print and for the foreseeable future digital access is the way that we'll get at all of these small area statistics. Now, before I talk about NHGIS, I want to talk about what I call the progenitors of IPUMS NHGIS. So in the last two sections, I just laid out the history of the small area census data that the Bureau has published. I described the ways that people accessed these small area data over the last 230 years. But during kind of the last 40 years, we've had a few different projects take place that helped lay the groundwork for what we put together here at IPUMS NHGIS. And I want to spend a little bit of time describing those products, because without them, we would not have been able to do the work that we've done over the last 20 years here at IPUMS. The historical demographic, economic and social data product, which is also known as ICPSR 3 was put together by the Inter University Consortium for Political and Social Research, which is headquartered at the University of Michigan. And they work with social scientists all over the country to essentially type in state and county statistics from census volumes from the 1790 to 1970 decennial census publications. These scientists worked with their students to type all of these data in and they then deposited each decade's data at ICPSR. And users who belong to institutions who are ICPSR members could log in and download these data files from the ICPSR website. The state and county data formed one of the core collections of IPUM's NHGIS, and without this early work that was done, we would not have the volume of state and county data that we have in our system. Dr. Donald Bogue, and his wife, Elizabeth Mullen Bogue, who worked at the University of Chicago, put together an amazing data set in the 1970s. They worked to transcribe the census tract data from 1940, '50, '60 and '70, transcribed the published statistics for census tracts into digital data files. And they put them on punch cards, which were then converted to data files. And then they were deposited at the National Archives along with ICPSR. And again, these data sets formed one of the key -- one of the core components of the original NHGIS data series. And having access to these digital data sets really helped us push forward with the work that we were able to do here. So we owe both the ICPSR 3 group as well as the Bogue's a debt of gratitude for all the work that they did to put these files together. The last group I want to talk about is Carville Earle and collaborators at Louisiana State University. In the '90s, they put together a set of historical US county boundary GIS files that depicted the county boundary footprints for each decennial census from 1790 up through 1990. Earle and colleagues published these data in the late 1990s. And we were able to use some of their mapping files as we were starting to build our county and state and territory historical county boundary files for NHGIS. And again, the groundwork that they put together kind of helped us design our path to developing the datasets that we disseminate through IPUM's NHGIS. So that leads me up to summer of 2001, in particular June and July of that of that year, 20 years ago. Some scientists here at the University of Minnesota applied for a grant from the National Science Foundation to set up the National Historical Geographic Information System. And as I've just described, we had a few different data products that were out there. They were housed in various locations, National Archives, or ICPSR, or available on CD-ROM from either the Census Bureau or from Louisiana State University. We had more and more digital data files coming online. But they were all in different formats. Some of them were more difficult to access, and there was no single entity that users could go to to find the historical small area data they would need to describe the social and demographic changes that have occurred over the history of the United States. This grant was funded by the National Science Foundation and we started building NHGIS, and I'm proud to say that myself along with my colleague, Jonathan Trader, were the first two graduate RAs who worked in NHGIS. And both of us are still here working on it 20 years later. So you can see that we were very enthusiastic about that project and continue to be so today. NHGIS provides easy access to summary tables and time series of population housing agricultural data, along with the GIS compatible boundary files for the 1790 to the present time period for all levels of US census geography, including states, counties, tracts and blocks. We set up NHGIS to be a little bit of a one-stop shop for people interested in small area census data from 1790 up to the present. And now I'm going to take you through a little bit of the history of NHGIS and what we've done over that 20-year time period to construct and disseminate a lot of these small areas statistics. Version 1.0 of NHGIS had three main deliverables. The first main deliverable was the construction of GIS mapping files for Census Tracts from 1910 to 2000, for counties from 1790 to 2000s, and states and territories from 1790 to 2000. We wanted to build these mapping files that would allow us to spatially examine changes in race and ethnicity, age and sex for small areas over this long time period. The Bogue files and the ICPSR files that were out there were the counts for these areas. But there was no easy way to construct digital maps showing what the patterns look like for cities in the US. So the inputs to our GIS files were the 2000 TIGER/Line files. We used scanned census tract maps from the print volumes that the bureau had been publishing. We used scanned county maps from a variety of data sources including the Census Bureau's maps, a document I'll talk about in a second from William Thorndale and William Dollarhide. Along with other mapping sources, such as John Long's Atlas of Historical County Boundaries published by the Newberry Library. We built the tract data using ArcInfo 7, and we built our county boundaries using ArcGIS 8.1. The process of building our historical tract boundaries started in 1990 and 2000, and we work backwards in time. Here you can see a small snippet of the scanned 1980 census tract map for Austin, Texas. And over the top of it, you can see the 1990 tract boundaries in red. We edited the red tract boundaries to line up as best as possible with the scanned tract maps. And we did that back in time until the first year in which a particular county or city had census tracts delineated for it. These census tract maps were quite fun and easy to work with. The Census Bureau has always done an amazing job with their cartography, and these maps were incredibly helpful going back to 1940. The 1910 to 1930 time period was more challenging, however. From 1910 to 1930, the census tract definitions tended to live as written descriptions that were published on microfiche. So here we can see the census tract definitions for Boston for the 1910 census. And essentially what we had to work from were the street names that served as the boundaries for census tracts. And we had to go in and essentially select those streets or digitize those streets in to serve as the tract boundaries for this 1910 to 1930 time period. And a number of grad students who have worked here at IPUMS with me were involved in that process. It was a lot of detective work to build some of these historical tract boundaries. For the historical county boundaries, we relied heavily on this publication, the Map Guide to the US Federal Censuses, 1790 to 1920, which is published by Thorndale and Dollarhide. And it was really a volume that was meant to help genealogists determine where different counties were as they were trying to track down their ancestors on the on the decennial census schedules that this bureau had made publicly available. Thorndale and Dollarhide did an amazing amount of work to track down the county boundaries for each decennial census. And here I've included a screenshot of the county boundaries that existed in Florida as of the 1890 decennial census. The black boundaries show the 1890 counties as they existed at the time of that census. The white boundaries depict the census tracts as of 1980, essentially. And you can see here that in Florida, and in particular, in South Florida, we've had a lot of tracks that were added in from the 1890 time period to the present. But we were able to work from these maps to construct the Dade County boundary or the Brevard County boundary as it existed at the time of the 1890 census. The second part of version 1.0 of NHGIS was the acquisition and digitization of the census tract statistics from 1910 to 1930. Now the Bogues had done the work for 1940 to 1970. But no one had ever digitized the 1910 to 1930 statistics. This project was led by Andy Beveridge, from Queens College and Social Explorer, and he and his team tracked down the census tract volumes for this time period. Now, as I mentioned before, most of this tract data was actually kind of held by local municipal archives because the data were tabulated by the Census Bureau and then provided directly to those organizations. Andy and his team found those volumes and typed in all the census tract data for those three decennial censuses. The final thing that we delivered for the version 1.0 was a web-based data access system that supported customized data requests. So we brought together the existing digital datasets from ICPSR File 3, the Bogue files, the data that Andy and his team put together, along with all of our GIS data into a single coherent platform that users could come and get data for. And in 2006-2007, we released that original data access system to the public. And this is what our original website looked like. And users would go in and they could select the data tables and shape files that they want to download. Now, the work that we did allowed us to put together maps such as these. These maps show the census tract level population density for the East Coast. These are the census tracts that existed in 1950 and 1960, 1970 and 1980. You can see over time, we can look at the difference in density as well as the changes in the extent for which tracts were available. It's not until 1990 that the Census Bureau delineated census tracts for the entire US. But the work that we did on version 1.0 supported this type of work and really set the groundwork for what's come since. Immediately after publication of our version 1.0, we realized we probably needed to do something better. The initial version was a little brittle. It didn't always stay up. I often had to restart it if users had requested too much data, and we immediately pivoted to designing a more robust data access system. Version two had a new user interface that supported data requests from multiple data sets, tables and geographic levels. Along with filtering by time, topic, geography and data set. We released version 2.0 of NHGIS 10 years ago, October 2011, with a revised user interface and a revised front page, which I show here. In addition to a revised user interface. We added additional datasets from the 2010 decennial census, the American Community Survey, historical agricultural censuses and vital statistics along with the GIS mapping files that supported mapping all of these different datasets. The final thing we did in version 2.0 that was a real value add is we developed and released harmonized data tables that account for changes in categories that the Census Bureau reports data for over time, along with accounting for boundary changes over time. From decade to decade, the bureau would modify the boundaries of census tracts or census blocks or census block groups, making it difficult to analyze change over time for a fixed geographic footprint. We developed methods for handling those boundary changes, which allows us to create maps like this. Here we've seen the change in school-aged population from 2000 to 2010 for block groups in the Minneapolis/St. Paul area. The blue graduated circles indicate areas of declines in the school-aged population over that 10-year time period. And the orange graduated circles show areas of increase over that 10-year time period. Without our ability to standardize these data, it would be difficult to know whether or not changes that are observed in the data are due to actual changes in the demographic composition, or changes in those block group boundaries. And after we released 2.0, we started moving on. We started working on 3.0 of NHGIS, and that's what we're currently working on now. We've added an additional access modality. A couple of years ago, we released our own application programming interface for NHGIS. Our API allows you to design extract requests that you can submit programmatically in Python or in R or in the programming language of your choice, so that you can create reproducible research and embed that data extracted directly into your code. You don't have to rely on our web-based data access system to acquire those shape files, GIS files and datasets. We got funding to construct 1970 and 1980 census block GIS files. The Census Bureau published digital versions of block statistics for 1970 and 1980. But no one had ever put together a comprehensive set of GIS mapping files that would allow you to be able to map all of those block data. So we have a set of undergraduates and staff members working to build these today. And so what we've done is we're acquiring high resolution scans of census block maps such as this one from 1980 in the Atlanta area. This shows the Census Blocks for tracts 25 and 26 in Atlanta. OR the paper maps, we're getting those from various libraries throughout the country, so that we can construct the polygons that go along with the census blocks so that you can map the housing and population data for these super fine-grained geographic units. And it's been really fun to work with the map librarians throughout the country to acquire these paper maps. The final thing that we're doing here at NHGIS in version 3.0 is extending our harmonized data to 2020 and beyond. The recent release of the redistricting files from the 2020 decennial census, along with the continued release that the American Community Survey means that we always need to extend our harmonized data to allow users to continue to look at a fixed set of geographic units and analyze changes in those geographic units over broader and broader time periods. So we're currently working to do that as well. All our data are available free of charge at NHGIS.org. And if you're interested in the historical kind of social and demographic changes of the US, NHGIS has done a lot of work to make acquiring and finding those data as easy as possible. And, again, I want to thank the Library of Congress for having me, and thank you very much. >> John Hessler: Hello, everyone. Welcome to GIS Day 2020. My name is John Hessler, and I'm a specialist in geographic information science at the Library of Congress in Washington, DC. And my talk today is going to be a little bit different than some of the others. There's been a lot of talk in the press recently about something called differential privacy and database reconstruction theorems, and how they're really affecting the data of the 2020 census. And although a lot of the people watching are, of course, GIS analysts, and have lots of experience working with census data, this has really been a point of contention and really a point of complication where a lot of us don't really understand how this is affecting the data and what this actually means. So today, what I thought I would do is I would actually go through and explain to everyone as best I can what database reconstruction theorems are, what differential privacy is and how it affects the 2020 census. Now over the last couple months, I've done a great many congressional briefings on various subjects. Everything from internet access to healthcare, to child poverty, all around the country. And these have, of course, used a lot of the census data and a lot of the data from the American Community Survey. But usually when I'm doing these projects, I don't have to think about things like differential privacy and actually how the data has been changed. This is something that is new in the 2020 census. The 2020 census, of course, is injecting some noise into some of the data in order to prevent the reconstruction of identities and the actual specifics of individuals who make up the census data. Obviously, each data point in the census, each piece of the statistics is made up of a group of people or an individual. And the technology has become available now where people can actually, using third-party data and using the statistics that the census has provided, basically reconstruct individual identities. This is something that the census has been aware of for quite some time, and in 2020 decided that they were going to employ some different types of algorithms and some different types of modifications to the data in order to prevent this from happening. As I said, it's been a controversial thing. And I'm going to really talk through what this actually means. What is a database reconstruction attack? What does it mean to take a census database and reconstruct individuals? So why is that a problem, and why the noise that's injected into the data has changed the way we can look at results and how we have to actually be careful with how we're using some of the census data? Now, this all started many years ago. There was a important paper by Dorner and Nissin, which basically constructed something called database reconstruction theorems. And what these two mathematicians and statisticians proved was there's a balance. If you release too much data, even if it's statistical, even if it doesn't reflect individuals, it's quite possible to reconstruct individuals out of that data. So the more data one releases, the less private it is. And in the case of the census, the census releases many, many different tables. And those tables can be put together in various mathematical ways and reconstructed in order to pick out individuals and individual data. And so there's this balance between the private information we want to hide and information that we want to reveal. So we want to actually hide some of the private data. But we also want to make the data useful enough so it can be used for the things that census data are used for -- not only apportionment, but for basically giving money and government contracts and healthcare distribution and geographical studies and all kinds of demographic studies around the world that influence people's lives. So it's really critical that we understand what is happening to the data and why the data is being changed. What's the importance of this? So as I said, there's this balance between privacy and accuracy. If we want to be completely private, we won't have any accuracy to the data. If we want to be completely accurate, in other words, if we give away all of the data, put all of the data in the tables, there won't be any privacy. But if we want to keep privacy to the highest we can, there won't be any accuracy to the data. So there's this balance. We have to figure out where on this curve of accuracy and privacy do we have, where we can protect the privacy of people who've contributed to the census, and where we have enough accuracy to still use the data for all of the projects that we want to do. The importance of the database reconstruction theorem says it was a very complex mathematical proof that basically proved that it's really, really difficult to do this. Any reasonable definition of privacy cannot be really maintained if we want to make the data accurate. And this was a model that showed how difficult this would actually be. And this is the thing that happened in 2002 when this was published. The census had already been looking towards this question of the accuracy/privacy balance. But this really proved that it was really very, very difficult to figure out a way, if you were releasing a certain amount of statistical data in the way the Census does, in order to keep the micro data, the data of the individuals who were contributing to the census, private. And I'm going to show an example of what that means. If we look here at this particular table, this is a typical census table. This is an example that's been published and been used as an example in many places. But this is a place where we have seven people. The total population is seven in this particular census block. And we can see that the ages, the median age is 30. And the mean is 38. And then we have the various attributes of the seven-person population. We can see here that we don't have any individual data in this table at all. This is all statistical data, all means and medians, all numbers that are above three. In other words, there's three males, four females for the total population. But we can also see this huge amount of data that has been suppressed. So even using this data with all of this information suppressed, how many white males there are, how many persons under five years there are, it is really possible -- is it possible to construct the information about each of the individuals that contributed to this table? In other words, is it possible to reconstruct the individual records as opposed to the statistical records? And basically, what the database reconstruction theorems tell us is, yes, it does exist. There is a unique reconstruction of this data that is possible if we don't do anything to the data. Now, what does that mean? So the database can be reconstructed in a number of ways, but we treat the attributes of the persons living on a block as a collection of variables in a mathematical model. This set of constraints is then extracted from the published tables. And the database finds a set of attributes that are consistent with the constraints. So if the statistics are highly constraining, which means that just like census data are, all of the various statistical data in there are related to single people, then there will be a single possible reconstruction. And the reconstruction micro data will be the same as the underlying micro data used to create the original publication. So to mount a full reconstruction attack, the attacker extracts all of these constraints, and then creates a single mathematical model that embodies them all. It then recreates the micro data, and uses it a unique mapping into the original data. So if we look at the table here on the right-hand side, what we're actually seeing here is we're seeing a table that shows what we really want to reconstruct. So if I'm an attacker taking that table that we looked at earlier into account, I want to reconstruct the age, the sex, the race, and the marital status of all seven people. And I want that to be individualized to each person. That data does not appear in the table at all. But I'm going to use the data in the table to reconstruct each of the individual identities here. Now, I'm not going to go into a lot of detail. But basically, this is done by something called the satisfiability solver, which basically we translate this problem into a formula consisting of 6,755 variables, which is the amount of variables that this particular problem will contain. We then arrange this in basically a Boolean tree of sentences. And this is called conjugative normal form. And this basically, when we do this, again, the database reconstruction tells us that there exists the solution universe. So there is a solution universe of all possible solutions to this set of constraints. So basically, if we say to ourselves, we want to reconstruct those seven people, there may be many different solutions to that problem, okay? But there does exist a solution based on those possible constraints. If the universe contains a single solution, then the published statistics completely reveal the underlying confidential data. And this can be done provided that noise is not added into the data at all. So if we just take our table of seven people, and basically convert this into the 6,755 variables arranged into 252,575 clauses, it is actually possible to reconstruct the individual data across that table. So when we look at this a little bit closer, basically, we can see this is done in a number of ways. So what we want to do is we want to construct the age, the sex, the race, and the marital status of each particular person. If we do this with the data, unmodified, this particular data has one solution and that solution is over there on the on the right-hand side right here. And it basically gives the age, the sex, the race and the marital status of each of the individual persons in this database. So there are seven people here, there is no individual data in this table. But from only this table, I was able to and the solution was able to reconstruct the identities of all seven of those people. If I combine this with a third-party database, say a voter registration database, I would be able to individuate both the addresses and the names of this group of people, because we know this is in a particular census block. In other words, this is in a particular geospatial region. So using third-party data, it's then possible to actually reconstruct based on this, who these people are. And this is really what the census is trying to avoid. Now, we can do that in a number of ways. So let's say we decide, okay, we have this unique solution, so we have to perturb this data in some way. So let's say I drop the columns 2A and 2B. If I do that, in other words, I don't tell you whether the person is male or female, I then redo this, the solution then contains eight different possible solutions. So not a single solution, but eight. But all of them contain four of the micro data. So we can still infer four of the micro data records, even if I delete two more of these columns. So the question now becomes, okay, how do I delete columns or add noise into the data in order to preserve people's identities? If this reconstruction theorem tells me that all of these solutions exist, it also tells me that there's a point at which there's not enough information in order to reconstruct people's identities. The question is, what is the balance there? If I drop too much data, the data table becomes absolutely useless to anyone who wants to use it. But what is the balance? Is there a balance between privacy and the data that can be released. Now, if we do that, again, in this toy example, if we remove the data constraints in this column 4A, which is basically whether it's a black or African American female, basically then I get two solutions. Okay, so the universe of solutions becomes two. And what's important about that is, when we look at these two solutions, we can see that none of those solutions in this column here, none of them are the same. So basically, I have two completely different solutions. So no one would be able to decide exactly which solution is right or whether any solution was right. So I've basically done enough in order to create two different solutions. Now, in this small model, of course, even these two solutions I could use to try and figure out whether that I've actually successfully done this attack -- in other words, whether I can identify any of these individuals. But at least we know by this toy example -- and the census has done this with the 2010 data as a test model to see whether actually one could reconstruct based on how much noise and how many of these pieces of data would be eliminated. Now, they've done this using something called top-down, which is a particular algorithm. And we're not going to go into the algorithm in any detail. But this algorithm basically allows the census to inject noise into the micro data, in other words into the individual level data, in order to put enough noise in there that these reconstruction attacks are not possible. One cannot solve a you a unique solution for the census data. And there are three possible ways that one could have gone about doing this. You can publish less data. You can input noise, in other words, you can put noise in as the data is being tabulated, or once the data is tabulated, you can then inject noise. And of course, this is the algorithm, the top-down algorithm is really difficult to study analytically. It's a very complex algorithm. But one can use models. And so far, there's been several scholarly studies on the effect on several important attributes that the Census is used for. One of which is called Census Top-Down, which is done by a team at Tufts, including Urm Dutchen. And they found based on a close look at reconstructed Texas data, they find reassuring evidence that top-down would not threaten the ability to produce districts with tolerable population balance or to detect signals of racial polarization for the Voting Rights Act enforcement. So there's been at least one study that shows in fact -- independent study that shows that it's not going to really affect the redistricting problem. There are other studies however, which have called into question some of the places where differential privacy and this noise has been injected. One is a study recently published on the census data and how it will distort COVID-19 rates. They found using empirical COVID-19 mortality curves that differential privacy will introduce substantial distortion into the COVID-19 mortality rates. And this is an interesting thing that is going to be going on now for probably the next few years where, as geographers and GIS analysts begin to get into this data and look at what's going on and how these noise injections from top-down have affected the data, there's going to be various conclusions and various error limits to how much error has been really injected into the databases, and how important it is to some of the stuff that we really need to know and we really need to study. The census is fairly confident that they've done a very good job in this. But really, until people begin really tearing the data apart and really looking at it in particular ways, we're not going to see, you know, how the scholarly community outside the census really reacts to this type of data. If one looks at a typical map, now one has to be very careful. So this is a census map, dot map of the population of Baltimore, and this is based on race. And I'm just going to pull up some of the data from a particular region. I picked just a census tract at random here in order to take a look. What you're going to see is that in any of these census data -- so this is one dot equals two people. So when we're going down to this type of resolution, things like differential privacy are going to affect the data. They're going to affect all of the totals that come out in each of these census tracts. And so we have to be very careful when we're presenting this data. This is a particular map that I presented to a congressional briefing on poverty in Baltimore. And basically, what I did is annotated the map in order to say that due to differential privacy, that some of the totals might not add up. And so this is something we're going to have to be very careful about when we're going down to these smaller scales. Do the numbers add up? Has differential privacy affected this in various ways? So most of the things that you're going to be seeing that are using some of the 2020 data are going to be annotated with this little annotation about differential privacy, and really how it is affecting the data. There's been a lot, as I said, a lot of things going on out in the field. The American Association of Geographers in the annual meeting coming up next year has a series of papers that are going to really deal with this. This is a study session which is really going to look strictly at the errors that have been introduced and how it is affecting actual analysis, geographical analysis and some of the things we want to know. So there's a lot going on right now. This is an extremely new thing. We are all really just working through what some of these algorithms mean, and what the data looks like that have been affected by the algorithms. But it's really an exciting time to see this difference between -- or this balance between privacy and accuracy. This is only something at this level that has come into being of course for the 2020 census. The high-performance computing and the algorithms that we have available to do these kinds of reconstructions are much better than they were in the past. But this is something that the Census has been dealing with for a very long time. In the 2010 census, there was a thing called data swapping in order to sort of prevent this kind of thing. But it's become a lot more important now. Even though some of these reconstructions are very difficult computationally, even brute force algorithms have a lot to say about these statistical reconstructions. I want to thank you all for listening. And I want to thank you all for participating in in this year's 2021 GIS Day, which is on the Census. Happy mapping, everyone. And thank you again for watching.