>>From the Library of Congress in Washington DC. [ Silence ] >>Bill: Hello, and welcome to the latest in our series of INDIP, brown bag, presentations, lectures, and we're very happy today to have Kate Zwaard, from Government Printing office, where she is a Lead Digital Preservation Program Planner, and is working with the Federal Digital System, perhaps more affectionately known as FDsys. And, it is an ambitious system that is looking to manage digital information from all three branches of government. Now, a public beta, I understand. I expect it to be completed by the end of this year. And, the system, actually, does all kinds of good things. It will enable publishing, by government agencies, digital publishing, search by members of the public and others, digital preservation, of course, something new and dear to those of use in INDIP, and, also, will do things like version control. So, if there are multiple copies of a particular publication, they will have each instance of it. And, with that, I'll turn it over to Kate who will tell us more. [ Silence and Clapping ] >>Kate Zwaard: Thanks Bill. It's a pleasure and honor to be asked to speak here today, and I'd like to thank ya'll for coming. As Bill mentioned, my name is Kate Zwaard, I'm from the Government Printing Office, and the title of my talk here today is Meeting the Mission, Preserving or Providing Access to Electronic Government Publications. I'm going to start off talking a little bit about the agency and our mission and our goals, and move to some of the more technical details about the systems and talk about our development roadmap. So, GPO; we opened for business nearly 150 years ago, next year we're celebrating our Sequa Centennial. And a few years later, after we opened, when we originally opened it was to sort of streamline the printing and the publishing for the Federal Government. It was originally done, kind of, all over with a lot of private, private companies involved, and, to save the government money, we opened, they opened a government printing office, and a few years later, we assumed responsibility for the Federal Depository Library Program. So, that is a system of libraries across the country that, in exchange for getting Federal Government publication from us, at no fee, they agree to provide access to anybody who asks for it. So, the way it works is this, when an agency comes to GPO, and orders prints, orders print copies, books or pamphlets, we have specialists at the agency who take a look at those orders and decide whether they'd make a good candidate for our public access programs, and if they decide that they do, they order additional copies. Those copies get boxed up and sent out. They host, those libraries, they host the material, they preserve it and they provide access to it and help people find it. But, with the advent of electronic publishing, a lot of agencies were putting those publications on their website instead of printing it, which made it really hard for people to find things. So, a lot, like if you were looking for a particular brochure or piece of information, you really had to know what organization was responsible for producing that, often down to, like, the third or fourth level like down to a specific department, and go to that exact website. Also, there was no organization in the government that was responsible for making sure that that material was preserved. Stuff would go on, up on websites and come back down the next day, or make, no one was making sure that it wasn't corrupted, or, you know, that it was still accessible as technology changed. So, in 2004, our public printer released a strategic vision for the agency. And so, as, he basically looked at what we considered, what some people considered disruptive technology, and thought of that as an opportunity. So, so, in project management terms, our Project Manager, what are the requirements, what are, let's go back to basic. So, we looked at the mission, by law and tradition, GPO has three missions, to provide printing and publishing services to federal agencies in Congress, to sell publications on a cost recovery basis, to provide public access, and to provide permanent public access to federal government publications through the Federal Depository Library program. So, taking those missions and thinking about what our role is, in a more digital system, the public printer proposed that we create a flexible platform for federal digital publications that would allow us to provide access to and preserve them. So, so we began on FDsys. What is FDsys? So, we break it down into these three, it's really three subsystems, is the easiest way to look at it. The first is, it's a preservation repository, its, in a traditional sense, so it's a, conforms to the OAIS reference model. Secondly, it's a content management system, and that's what kind of sets GPO, GPO's system apart from a lot of the preservation repositories you hear about, because it's really the electronic backbone of the agency, so, as print orders come through the agency, as it goes through our workflow, FDsys manages all those hands-off, which is really kind of the, where a lot of things can slip through the cracks, and you want to be able to track things as it goes through the proofing and production processes, and it's also an advanced search engine. So, we do a lot of metadata creation, that's part of our expertise at the agency, and we try to make as much use to that as possible. We have a lot of years of experience in helping users find government information. So, we want to put that to use electronically and think about ways to help people navigate towards what they're looking for, and what kind of metadata users need to find those information. So, this is a diagram, I like to think of these as diagrams made by engineers for engineers, so it's got a lot of boxes and arrows, but, this is just a general overview of how FDsys works, and I'm going to kind of break it down and talk about each box individually. [Silence] So, the first thing that happens to your content, once a federal content originator decides that it's ready to be, it's a final published version of something, is it goes through INGEST, and I'll talk through each one of these bullets separately. But, INGEST is, it's a series of steps that insures that GPO can take full responsibility over the asset. But, it also enhances the asset with metadata, preservation metadata, descriptive metadata, that help us preserve it and provide access to it. So, those of you that are familiar with the OAIS, I always, like, trip over that, OAIS, it's like I, its, I'm trying to always, like, try to catch up to it. The OAIS reference model, this will look familiar to you. So, things start out as submission information packages, when they're a final public published version from an agency content originator. A submission information package can either be content and metadata, or it can just be a file, we're very flexible that way. Once that goes into the system, that line there is INGEST, and it become an Archival Information package, it goes to Archival storage. In an extension of the OAIS reference model, GPO also does Access Content packages, so that's a little bit different than the way most archives do it. Because we're not just an archive, we're also a production facility, where people, you know, these items will be reused, repurposed, reprinted from, perhaps some demand leader, used a lot of metadata enhancement and browsed pages that created. We create a content package that's in more high access storage that can be used and played with. The Access content package can be recreated from the AIP at anytime, so that gives us a little bit more flexibility. And then, [inaudible] package is available from uses. So, this is an example of a content package in a system, and FDsys is a package based repository, which means that we package our items in, together with metadata. We try to keep each packet, roughly equivalent to one bound volume, so it's a discreet unit, and we collect metadata and [inaudible] format. And then, you can see at the bottom here, we have multiple file formats. Almost all the items in our repository have multiple file formats, we call them representations, and that's for a few reasons. One is that sometimes we just get them in multiple file formats, so a scanned book might come with TIF, and OCR text. Another reason that, it reduces the risk that we won't be able to render one of those file formats in the future. So, if we have multiple file formats, it's kind of, it's kind of a belt and suspenders approach. And, the third reason is that some of these file formats are more complex to preserve. So, an example is post script, but they are very full featured, they're very rich. And, others are very, much simpler to preserve, like the text format, but they do strip out some of the look and feel. So, our first priority in preserving items is to make sure that we're capturing the content itself. But, we'd like to, if possible, preserve the look and feel for our users, so that's why we capture in multiple file formats. [Silence] This is a high level information flow. This is another diagram for engineers by engineers, where you can see that when content comes through, we can create rules to group them into packages, and then we try to extract metadata, I'll go through that a little bit, a little bit more in a second, and then, we can use that content metadata to do various things. So, we can use it to power the browse, we can provide it to users if they want to ingest it into their archives or into their catalogs, and we create, automatically create browse pages. So, one of the really cool things that I think FDsys does is that we try to extract as much metadata as we can. And the idea is that we're going to automate as much of the content lifecycle as we possibly can, so that we can have, we have a lot of really smart people at the agency, we want to make sure that we're using their brain power as efficiently as we, so that we can point them to more high value targets. So, we take care of the easy stuff on the front end. So, we've written a set of rules where we tend to look for certain metadata elements in content. So, here an example is that we might run this set of code against a Congressional bill looking for public law citations. And so, we populate much of our metadata in our usual serials that come into the agency, we probably get much of our metadata like this. And, actually, the quality of the parsing has been really good. But, we do allow for metadata editing by experts to make sure that everything is correct and that, if they want to enhance, enhance records they can do that. So, this, I realize this is a little bit too small to see, but I think our metadata is so pretty, I just need to show it off. So, what you can see here, we have some classifications that we take from catalog and records and we can template that across issues of a publication, so we have, we take full advantage of the recursive nature of MOD's, so that we have, we have a record at the issue level of journals, but we also a record for each one of the articles, and they are stacked together. And, on the bottom there, you can see where we put the URL's into the metadata record, that way if the record itself is downloaded and used somewhere else, the context is preserved. [Silence] Another thing we do during the INGEST process is we take, we take files that are too big to be usable, and break them down into smaller chunks. So, an example here is nobody ever really wants a whole issue of the Federal Register. They are looking for, like, a specific piece of information. They're most likely looking for an article. But, even if they're, like, just big books, nobody needs to download, you know, four thousand megabytes, so we automatically break them up into smaller chunks. We try to do that as much as, if there are natural breaks in the publication, we try to make use of those. So, here's an example of the Federal Register, where we're looking for, the software's looking for certain items that will help it break it into articles, article levels. And, during the INGEST process, we create our preservation metadata. So, the system [inaudible] some information about how the [inaudible] is represented, for example, we use the droid tool from the National Archives of UK to discern file format, and we link to their [Inaudible]. And preservation specialist, we try to get as much as we can automatically, but there's going to be some stuff that require, you know, humans to take a look at, so, an example might be creating application, or identifying significant properties of an item. And then the system will automatically create structural metadata, so we try to create a layer of separation between our management software and the content itself, so you don't necessarily need to rely on the management software to understand what the content is or how it's related to other pieces. So, we create metadata to describe that. Okay, so the next piece on the chart, is the data management layer. The data management layer is responsible for maintaining authenticity and for preservation, and I'll talk about each one of those individually. [ Silence ] So, in terms of authentication. So, authenticity is really central to our digital programs at GPO, because we're providing official government information. So, it's really important to our users, especially the users that are using regulatory and legislative information. So, what our goal is, is to provide the tools and evidence that users need just to make absolutely certain to themselves that, you know, content hasn't been maliciously or accidently altered, that the stuff is in there is the things that we've approved, that there's no additional stuff in there, and that stuff hasn't been removed, and that it all comes from an official source. [ Silence ] So, in terms of tools and evidence that we provide to users to make sure that they, for them to be absolutely confident that what their using is authentic government information, is that we take a check sum of each file that comes into the system. So, the check sum is a algorithm that reduces the bits and bytes of a file into an alphanumeric string, and it any single bit or, in that file, gets changed, that algorithm will generate completely different string, so that does two things. One is that it allows us to periodically pull all the content in the archive to make sure that nothing has been accidently changed or maliciously changed or even just corrupted over time. And, it also, we provide that information out to users, so the technically inclined can install some free software and check any file in the repository. We also create a very detailed events list, using premise metadata, so it records the date, time and user of system component that has done that action, and users can go and use that information to check the provenance of an item in the repository. And then, we digitally assign all pdf's as we get permission from the content originators to do so. So, some of their event we record in provenance information, things that software and things that humans do. We try to do anything that creates a new asset or fundamentally changes the asset, we try to record, and that will allow us to troubleshoot if we have, you know, a user that's doing something they shouldn't be, we can be able to track everything that that person touched and have somebody evaluate it, or if we have a software component that's maybe creating pdf files that aren't quite right, we can track that from a certain date and time, and make sure that we evaluate those to make sure that they're okay. And you can see that AIP approved, nominated for deletion and approve for deletion. So we have preservation specialists that can take our archival packages and nominate them as candidates for deletion, but it requires three other users to approve that for deletion and we create a record in the archive of that deletion to make sure that, where it's traceable and legitimate. In terms of preservation, preservation processes safeguard content in the metadata. We try to reduce reliance on hardware and software as much as possible, so that means, as I talked about, creating the layers of protection between the content management software and the content itself, but also trying to, as much as we can, encourage our content originators to submit file formats that are more friendly towards preservation or creating derivatives where we need to, and meaningfully, be able to meaningfully render that content despite changes in technology. [ Silence ] Archival storage, so, this is the system component that's responsible for the integrity of the content and metadata itself. So, as I mentioned before, we create two separate copies of everything that comes into the system, the access content package and the archival package. So, the access content package is the one that's most accessed and looked at by internal users. The archival information package in archival storage is only accessible by a select few perseveration specialists in the agency, so they are people who are specifically tasked with preservation processes and evaluation. The general GPO user doesn't have access to the archival storage. We do weekly system backups that are stored offsite, and then all of the items in the repository can be downloadable, in full, with all the content and metadata. So, the access component, this is generally everybody's favorite, it's very fun. We thought a lot about how users find government information. So, finding government information is, actually it's kind of a special challenge. It's not liking finding general information, so, you know, a user might be, if they were looking for information on monarch butterflies, a general catalog search, or a web search might suit them, but when users are needing government information, they are actually looking for a very specific piece of information. They are looking for a specific chunk, in a specific publication, you know, they are looking for a particular bill or law or, you know, Congressional records citation. So, and the other challenges that, you know, it tends to be repetitive, so, you know, if you are looking for a regulation, similar regulations will have appeared in previous Congressional records, and then, also in draft form, the Federal Register, so if you would do a general search on hot dogs, for example, that's not going to be very useful for you. So, what we've tried to do is extract as much metadata from the publications as possible, and allow those people to be able to narrow it down, and I'll show you an example of that a little bit later. We also provide advanced search results so that you can search on any descriptive metadata we feel we have, and we have a lot. And, we also allow for citation searching, so if you know a specific citation, in the Congressional record, you can go to that page immediately, without having to go through a couple of screens. We also make a lot of use of the related item, element and mods, a lot of government information is interrelated, so, what we've tried to do is extract references to other publications wherever we could, and that way, later on, we'll be able to build navigation between objects automatically, and hopefully that will make government information more usable to the non-expert. We have a continuity of access incidence at an offsite location, so if, you know, there was a disaster, some kind of way we couldn't get into the building at headquarters, we would still be able to provide access to this information that's critical to our democracy. And, we provide lots of options for download, which I'll show you a little bit later. So, here's an example of the facetted searching that we provide in FDsys. I think it's one of the coolest things that we have. So, you can see here a user searched on the term health care, and then the system knew that the various collection, the various titles over the term health care appeared, comes up in the upper left-hand corner and the user selected Congressional bills, so the system is only showing Congressional bills that have the term health care, and then the user also selected date published for 2010. So, from here, they can chose whether its House or Senate side, they can choose the Congress person named in the Legislation, so that allows you to kind of narrow it down, because you don't always know what you're looking for, until you have the list of options, so, that helps users find what they want. Here's an example of metadata display. What we've learned is that users really hate looking at XML. They really do. So, we've created a set of transformations that take the XML and make it a little bit more user friendly for display. So, these are some of the select metadata elements that we display, and we can also do that by collection, so, when general stuff comes in, we have a set of metadata that we collect for it, but we also collect specific metadata for each collection, so there are certain elements that we collect only for the Federal Register, only for Congressional Calendars, so that gives us a little bit more flexibility to display what users are looking for. So, content that development from FDsys.gov. Like I said, there are a lot of options for downloading content on FDsys. It's kind of like an all you can eat buffet, but all the options are healthy for you. So, for each, as I mentioned we make granules of big files, so, for, like, to take the example of the Federal Register, you, where the granule is at an article level, you can download an article in several different file formats, and the metadata record for the article, or, there's a zip package, where you go one level up, for the whole issue, all the granules, all the file formats in the met's, mods and premise for the whole thing, so if you wanted to play the FDsys game at home, you could create your own archive, or supplement the archive that you were already building with our data, and its already described for you. [Silence] So, that's kind of a brief overview of the system. I just want to talk a little bit about some other projects that we've had going that are a little bit outside of development. We release, with, in partnership with the Office of the Federal Register, we release a daily compilation of the Presidential documents, and that replace the weekly compilation of Presidential documents. It was a great example of a partnership with the archives, because they saw what we were doing with the FDsys and they came to us and said, "How do we structure content so that you guys can pull information out of it, and how do we structure content to enrich our user experience." So, we worked with them to create that, to create the daily compilation so that it would work seemingly with FDsys. We also have created some transforms for the Federal register in the Code of Federal Regulations, so that those collections are now available in XML, and we are already seeing some cases of citizens using that XML to create more dynamic and more specialized websites, which is real exciting. And, the Federal Register 2.0, which most of you may have heard of, which is, it's based on, we created, sorry, we worked with the office of the Federal Register on this project, and it's based on our XML feed where its more topical, more of like a news site. The idea is that , as the official journal of government, it's really the daily newspaper of the United States Government, so it should look like a newspaper. It should be more usable to citizens, so, and integrated with that is some social media stuff to keep the citizens more active in their government. GPO has been involved in open government since we started. We're all about open government. So, some of the things that we've been working on is, FDsys is really based on the concept of interoperability and reads, so it's as modular as we can make it, and we've made so that all of our content is available on all the major search engines. We have a bulk data repository now, which we're very excited about. So, if you want a whole, all the CFR's, you can totally download them and repurpose and reuse them, we do that with a few of our collections. We have permission statements on site that make it compatible with Locks, and we already have some private Locks networks downloading the content on FDsys. We're, we, making the content more open, is helping citizens find more ways to interact with it. And then, other federal governments are integrating with FDsys to enrich their user experience. For example, Regulations.gov uses a lot of our metadata to enrich their navigation and Science.gov uses FDsys content to provide access to information about science in the federal government. So, talking a little bit about Release 2, so, we are closing out Release 1 now. At the end of the year we should be, we're in public beta, so its usable, and we don't treat it like a beta set, we do treat it like production, but we will be the repository of record for the government printing office, at the end of the year. And, as we're moving forward to figuring out what's going on for Release 2, these are some of the things that are on the table. As I mentioned before, one of our goals with FDsys is to build a comprehensive collection of federal government publications. So, there are some exceptions to that, so things that are for administrative use only or have privacy concerns are not part of that, but anything that most people would think of as a publication is in scope for our program. So, that's where we will be focusing on for the next release is building up the collection. Some new contact collection, the public papers in the President in XML, starting with the first Obama book, and external content submission, so, right now a submission to the site, submission to FDsys is done through internal users, so the content comes into GPO, internal users submit it to FDsys, and we'd like to extend that out to the, to agencies, so they can submit and, one of the things that will do for us is we'll be able to collect metadata from the source, rather than having to go back and get it. And, we'll also be able to enhance our chain of custody. We're also looking to expand our collection of harvested content, so I really like thinking harvested content. When people say it, it sounds so simple, but it's really a set of complex challenges, so, I mean, how do you know what to harvest? Where is, what's the scope of your content? How can you tell what on those websites are in scope for your program? How do you package it? How do you provide access to it? So, those are the kind of things we're working through right now. We're putting together a harvesting strategic road map, and we'll pilot a collection for, to see how that's searchable and displayed. Enhance preservation support, so, as I mentioned, we're using Droid for file format identification. We'd like to integrate Joe for validation. And we are in the process of getting ready for a track audit, so I believe we'll be the first federal agency to be audited for compliance with the track checklist. So, we're very excited about that, and excited to have someone kind of come and take a look at what we're doing and give us some suggestions for improvement. So, that's all I had today. I hope you have some questions for me. Thank you so much. [ Clapping ] [Inaudible Speaker Question] >>Kate: So, the question was, what are some of the underlying technologies that form FDsys? We are using Documentum as our content management system, and FAST is our search engine, and then a lot of custom integration. [Inaudible Speaker Question] >>>>Kate: Yeah, we're doing a lot of our development in JAVA. Yes. [ Inaudible Speaker Question ] >>Kate: That's a good question. The question is, how long did it take to build the system? We, I would say that a lot of the time went into requirements. So, we have a diverse set of stakeholders at GPO, and, in a way, that's exciting, and it's also challenging, so, we serve the public. We also have, as I mentioned, the Federal Depository Library system, so those, the government librarians at those libraries are very invested in what we're doing. We also serve agencies. We also have, you know, internal stakeholders, so talking to all those people, making sure that their needs were captured was about a one year process, and then, about three years, no two years of development. [ Silence ] Kate: Yeah. [ Inaudible Speaker Question ] >>Kate: Okay. Yeah, so the question was, can I talk more about authentication and what we do to provide for that. So, let me get my notes out. [ Silence ] So, GPO's view of authentication is, it should be a user centered activity, so that our role as an agency is to provide tools and evidence for users to be certain that what they're viewing is authentic government information. So, that sort of has two parts. One is that the user needs to be sure that what they are getting is from a trusted source, and that trusted source vouches for the information, and that , the second part is that information has changed since, since it was validated by that trusted source that it is authentic. So, so, certifying GPO is a trustworthy digital repository is part of our authentication solution, so making sure that FDsys is trusted by the public, and that the content that we provide is only official federal government information, is one part of it, so that's the front end. The back end is making sure that content hasn't changed. So, providing those check sums out to the public, creating a check sum for each one of the content and then digitally signing is, kind of rounds out our authentication approach. Does that answer your question? [ Inaudible Speaker Question ] >>Kate: So, the question is, to what extent our authentication efforts are being driven by user requirements. Yeah, so, we hear a lot from the Law Library community. So, what they're looking for is making sure that Congressional, Legislative and Regulatory information is admissible in court. So, that is, so that's kind of two parted, is that what we're providing a good representation of the content as it was created by the content originator, so, for example, text might not be a good representation, if the pictures are stripped out, or if some of the formatting is stripped out, but PDF is, so that's why, one of the reason we're signing PDF files. The second part is making sure that that content hasn't been changed. We have heard a lot from users that they're looking for bulk authentication, so that when someone download, for example, a whole year's worth of Federal Register issues, that they can serve that through theirs, through their website and have it authenticated back to us. So, that's one of the things we're looking at, technology solutions for. [Silence] [ Inaudible Speaker Question ] >>Kate: So, we have actually two continuity sites. We have a continuity of access site that is operational right now, so its just the access component, and we have tested that a couple of times, and its working very well. In December or January, we're looking to make our continuity of operations operational. And, that will be, you know, from the submission all the way through to the access component, so right now its just that access component and we will, obviously, test that before it goes, before we consider it to alive. [Silence] [Inaudible Speaker Question] >>Kate: The question is, how does FDsys integrate with other GPO business tools? So, there are a couple of different components we'd like to integrate. One is our IOS system, so that's the system that our catalogers use to catalogue congressional publication, sorry government publications. Right now, we exchange metadata between FDsys and IOS by hand, and so we'd like to automate that handoff, so that when content is catalogued, we get that metadata and when content is submitted into FDsys, they get that notification. A couple other items that we'd like to integrate is ORACLE, and ordering system, so that when print is ordered, that that order immediately goes through each other's systems, that we haven't completed the integration of, and, also, our composition system. So as a print and production facility, we compose a lot of our publications in-house, so we're moving to an XML based composition system, which will help, obviously, make things more preservable and reusable and so that will be tightly integrated with FDsys. [Silence] Yes? [ Inaudible Speaker Question ] >>Kate: So, the question is, how can we help our state and local partners with what we've developed here? We would love to enable some reuse of our code, and we're looking into what it would take to release some of what we've written open source. We also intend on releasing all of our design documentation, so our security specialists are looking through that right now to make sure that there's nothing, no reason why can't release it or taking out what would cause a security concern. And, I think that some of the organizational components could be reused by state and local component, yeah. [ Silence ] Well, I should say, we're also talking with a lot of the state folks in terms of authenticity, so we're hoping that our framework will be useful for them. [Silence] [ Inaudible Speaker Question ] >>Kate: So, the two question, I guess its two parts, one is FDsys a replacement for the catalogue, and the answer to that is, obviously, no that the catalogue serves a different purpose. We catalogue a lot of print publications that don't have electronic counterparts. We'd like for them to share information, but they'll remain two separate systems. And, I'm sorry, I forget your second question. [ Inaudible Speaker Question ] >>Kate: So, the second question is, are all electronic publications going to be accessible through FDsys? So, the goal is to build a comprehensive collection of electronic federal government publications, with certain exceptions that mentioned, the one, so, the actually statute is, I won't get it exactly right, but its all publications produced by or at the expense of the government, with public interest or educational value, excluding privacy concerns, for administrative use only, or official use only publications. But, that's the aim, including past publications, so, you know, we're looking to, we'd like to digitize old publications that are in our program, we're waiting for approval from our, from the Joint Committee on Printing, our oversight committee, to be able to do that. And, we actually did a pilot with statutes at large, that were digitized by the Library of Congress, where we ingested those and provided access to it, and we're just waiting for approval to be able to put that into production. [Silence] [ Inaudible Speaker Question ] >>Kate: Yep, and not only for current information, so, like, you know, information about your flu shots, but its also of interest historically, so for researchers in the future too. [Silence] [ Inaudible Speaker Question ] >>Kate: The question is, can we talk a little bit more about our long term plans to archive websites of federal agencies? So, FDsys won't archive entire websites, that's more of a narrow kind of issues, where they're looking at the records of government. We are specifically focused on publications, so where there are publications posted on federal websites, we'd like to be able to, we'd like to be able to harvest those and provide access to them and preserve them. So, that's actually one of the big challenges, if your harvesting full webpage's, I mean, that has a set of challenges on its own, but you can just, kind of, take it all and bundle it up and save it. But if you're looking for publications that are in a specific scope, it's a little bit harder, because you need to either write a set of rules that help you find those, or you need, like, a lot of people to be able to sort through that. So, we're looking at some of the solutions that will help us do that. We did a pilot a few years ago where we harvested from the EPA's website, and tried to write a set of automated rules that helped us sort through what's a publication and what's not a publication, with some success, but the technology just isn't quite there yet. One of the things we're looking at is talking to federal agency webmasters and seeing if there are ways that they could tag those publications or put them in special folders that we could find them more easily. But the scalability is really interesting. [ Silence ] >>Kate: Thank you. [Clapping] >>This has been a presentation of the Library of Congress.