>> Robert Brammer: Thank you for joining us. I'm Robert Brammer, and we're coming to you from the Law Library of Congress in Washington D.C. In this webinar, titled Lunch and Learn: Historical Legal Reports from the Law Library of Congress, Senior Legal Information Specialist Stephen Mayeaux, Archiving Technician Jonathan Donovan, and Junior Fellow Michael Mellifera will discuss the Law Library's efforts to make our collection of historical Law Library reports accessible online and full tech searchable. You can submit questions at any time through the Q&A box at the bottom of your screen, and our panelists will answer them at the end. And so with that I'm going to turn it over to Stephen Mayeaux. >> Stephen Mayeaux: Thanks Robert and thank you all for joining us this morning. I am Stephen Mayeaux from the Digital Resources Division of the Law Library, and I'm lucky to be joined by two very talented colleagues, also from the Digital Resources Division. We are delighted to be here today and provide more of an inside look into a project that highlights the many unique contributions of the Law Library of Congress to the body of research in foreign, comparative, and international law. And I should start by just asking what exactly is it that we're talking about today? I'll start out by saying that for those of you who may not know, a major function of the Law Library is to prepare written reports on legal topics with an emphasis on foreign, comparative, and international law in response to requests from Congress, the executive, and judicial branches of the federal government, and others. And in that regard, the Law Library is similar to our sister agency, the Congressional Research Service, except that these reports focus specifically on foreign law. The Law Library has authored thousands of these reports from at least the 1940s to the present and continues to write new reports on contemporary topics every month. The Digital Resources Division, in collaboration with the Global Legal Research Directorate of the Law Library, recently began a multi-year effort to digitize and publish many of our previously unreleased historical reports in order to make them fully accessible to researchers and other members of the public. And as a disclaimer, I should note, as we often do, that these reports are provided for historical reference purposes only. They do not constitute legal advice and do not represent the official opinion of the U.S. Government. The information provided in these reports reflects research undertaken as of the date of writing and has not been updated unless specifically noted. So, the many different report covers on this slide shows how the appearance of our reports has changed over the years as well as the extraordinarily broad range in legal topics that have been written about over the years by our staff. Our more modern publishing format is shown in those reports containing the current Law Library logo in black and orange. So, if you were to go to law.gov right now, the most current reports that you see from about 2018 to the present will look like those, and the reason for displaying these different formats is to show that many of the reports in our collection may have been treated differently depending on the purpose of the report and the preferences of the Law Library at the time. So, some of them, like the ones you see in the center of the screen and on the previous slide, were formatted for publication, they were bound, they were catalogued, and officially added to the library's collection, while others may have been delivered only to the requester at the time and may never have been officially added to the collection until very recently. So, an example of that would be the one on the bottom left corner about wiretapping and eavesdropping in near Eastern and African countries. And so this is where the legal report archive project came in. This is a project that began conceptually several years ago, but it really gained momentum around 2018 and 2019. So, we'll spend some time today talking about how we got this project started, where we are now, and where we're hoping to go in the next few years. So, I'll turn it over to Jonathan Donovan to talk about the inventory, file processing, and storage phases of this project. >> Jonathan Donovan: Thank you, Stephen. So, with this project we started with two different types of documents. There are paper reports to be scanned and born digital reports to archive as well. All of the archived reports will be kept in long-term storage, but many of them are also made public. So, we end up with four different pathways for documents based on whether they're paper or digital and whether they are slated for storage or for public access. And when the project began, our focus was on digitizing the paper reports, but when the pandemic started its scanning had to be put on hold. So, we pivoted to focusing on the born digital section of the project, and that change was very well suited to allow us all to telework. The paper section of this project was mostly contained within a small storage closet full of reports dating from the 1940s to the early 2000s, including many older reports printed on thin carbon paper that had not been found in any other printed format. An initial inventory found over 4000 paper files, which we first organized by requester, and after the initial round of organization we determined that many of these reports were actually individual coupons of larger, multinational reports on a single topic. So, while it was possible to conclude that these separate reports belonged together, they were combined into a single document. And in some cases, these combined reports may never have received an official title or a final cover sheet, and in those cases we provided titles of these reports that describe the contents. However, in other cases, we may only have been able to locate one or more reports that were presumably part of a much larger report at some point. It's possible that those missing reports have been lost to time or that they were never finalized, and many of those cases involve reports from the 1960s and 1970s. In other cases, some paper reports were determined to be duplicates. So, in the end, after combining reports and removing the duplicates, we estimate that the total number of unique paper reports is closer to 2000 than 4000, and as you can see from the slide, 960 of those have been published online, with 200 being digitized now, and approximately 500 still to review. So, in addition to the paper reports, we also have born digital reports going back to 1987, and when I started working on this project two years ago I began with the inventory of these born digital files. The set of folders we inventoried for this project contained over 100,000 files that had been created over the course of 30 years, and that included not just completed reports but also duplicate copies, in progress drafts, request e-mails, memos, and all sorts of other Law Library files that fell outside the scope of our project. So, the first goal was to go through those files and identify which ones represented unique, completed Law Library reports. Often this required looking at changes in the text and comparing the dates of each file to identify which was the completed version that the library sent in response to the original research request. For each of those reports, we created a PDF copy for the archive. In some cases, that just involved copying an existing PDF, but many of the other reports were in other file formats, including lots of Word and Word Perfect documents. So, Word was able to open most of those files, but many of them required some amount of reformatting for text to display properly. So, we created new Word doc versions of those older files and made any formatting changes in that intermediate copy before saving a new PDF for the archive. Charlotte Stichter, who is editor of the Law Library, was also instrumental in creating the archiving project and then establishing the guidelines for the whole process, and if you look at this report on human cloning laws created in 1999, you can see an example of some of the reformatting we did. The original Word Perfect file would open, but it displayed the text in a font that was unusual for the Law Library. So, in creating the new PDF for the archive project, we converted the text to a more standard font based on those guidelines from Charlotte. In a few cases, especially with the older files, we had documents that were just-- couldn't open. So, we met with Amanda May from the Preservation Reformatting Division, and with Kathleen O'Neill in the Manuscript Division, and Kathleen found a few computers in manuscripts that could open most of those older files with the correct formatting. So, in that case, reaching out to people outside of the Law Library was really very helpful to the project. While we were creating these archival copies, we also recorded information about the reports themselves. Within the Law Library, each research request is assigned a tracking number, which you can see in the upper left corner of the documents, and we record these numbers for each report. But it has also been helpful for us to assign each archival report a unique document number specifically for this project, and that number also tells us which of those four pathways of the doctrine it belongs to, whether it's paper or digital, public, or long-term storage. Many of the older reports were untitled or only titled with the name of a country. So, in those cases, I wrote a tentative title in brackets just to help us keep track of the subject. And one of the interesting things about looking at several decades of reports has been seeing the huge number of research subjects the Law Library has written about. Some subjects are very common, especially those related to family law, and those might generate multiple research requests each year. Other reports deal with a developing legal trend. So, we have a series of reports that have been updated of expanded periodically. Some of the subjects that generated several reports over the past few decades include human cloning, computer security, and deregulation of TV and the internet. There are other cases where a research request is very unique, and the library may have only written one report on that subject. And you can see some of those more unique subjects here as well, including mining in outer space, Absinthe laws, and gambling on ships in Canada. So, looking through these reports didn't really like going through old newspapers or magazines because each one is such a snapshot of what was going on at the time that it was written. Our goal has been to archive every finished report, whether it's published or not, but before we publish anything we want to make sure we're protecting the privacy of the requester and anyone who might be connected with the report. So, within the Law Library, supervisors and the research directorate review all the archive reports to identify which are the best candidates for publication. For reports that we do want to publish, even though the original requester will not be identified, we do our best to contact that requester and get their permission to publish whenever possible because we want to share as much research with the public as we can while ensuring that respect the privacy of everyone involved, as well as the preferences of the original requesters. And now I will turn it back to Stephen to talk about the publishing process. >> Stephen Mayeaux: Thanks Jonathan. So, once a report has been determined to be suitable for publication, we follow a unique process for publishing reports via the library's digital collections framework. In most cases, these reports don't have a pre-existing bibliographic record. So, we start out by creating what we call a stub record in our catalog that really just contains minimal information, like title and date, and that provides us with just enough information to attach to the digital item and to make publishing possible. And this is really where it's important to highlight that this larger publishing process is greatly enhanced through a joint effort between the Law Library and the U.S. Government Publishing Office, or GPO. So, once these reports are published with minimal amount of data, they're actually shared with GPO, who provide a full bibliographic record for these newly available reports. And then the full reports, the full records are distributed via OCLC, and then they're reimported back into our integrated library system. In addition to their availability on loc.gov, these reports are also discoverable through the Catalog of U.S. Government Publications. Overall, the publishing effort was not significantly destructed by the pandemic in the last year because we were lucky enough to have already scanned many of the paper reports prior to the disruption of our on-site operations. And so much of the publishing process that I described, as well as the review and the quality assurance work on our end, was able to be accomplished via telework, and the same goes for the collaboration between the Law Library and GPO. Thanks to widespread telework, we were able to remain in contact with GPO throughout the pandemic and to keep the work flow running. So, here's an example of how the final published report looks in the libraries digital collections framework. You can now access nearly 1600 of these reports, most of which were published since March 2020, and you can get there by visiting loc.gov/collections or just through a web search for Publications of the Law Library of Congress. You can see the full URL on the screen, and it will also be placed in the chat. And now that we've spent some time talking about the process of publishing the reports themselves, I'm going to turn it over to Michael to talk about an exciting new crowd sourcing campaign that seeks to make these newly available reports more accessible to researchers. >> Michael Mellifera: Thank you Stephen. The two digitized paper reports on the left, Governmental Procurement of Domestic Products Under Japanese law and Divorce Among the Akan of Ghana, are the only known copies of these reports. As we encountered reports like these, it was important for us to ensure these historic documents were not lost to time. A powerful tool in preserving these reports is software known as Optical Character Recognition, which automates the text data extraction from scanned documents into a machine-readable form. The purpose of OCR is to make digitized paper reports, key word searchable, and useable in machine processes, such as machine translation and text-to-speech for screen readers. You can see on these scanned pages that many of the historic reports were printed on onion skin carbon copy paper, and sometimes even include handwritten notes. This presented challenges for OCR software due to the bleed through effect and legibility issues. Another challenge is that some of the reports contain foreign languages other than English, as you can see in the right most report, which presented challenges due to limits of multi-language support for characters not in the Latin alphabet. OCR doesn't always work perfectly, and study suggest that human transcription is often more reliable. So, to address the large number of legal reports that were not susceptible to machine reading, we proposed a crowd sourcing project with By the People to enlist the help of the public to transcribe and make these reports more discoverable. By the People is a volunteer engagement program hosted on crowd.lock.gov, where we invite the public to transcribe, review, and tag digitized pages from the Library of Congress' website. Anyone with an internet connection and interest can contribute to making these rich materials accessible to all, including people who are not fully sighted. The transcription process involves uploading materials on to crowd.lock.gov where volunteers can transcribe and then review them. The completed transcriptions are then returned back to the Library of Congress' main website with the materials. In total, 655 legal reports were included in this By the People transcription phase. We released the By the People crowd sourcing campaign as part of Law Day in 2021 and organized the campaign into 24 projects based on broad legal topics. This campaign has been a huge success so far, with nearly all 6100 pages receiving an initial transcription and 80% of the collection now fully completed. We would like to encourage you to help us review the remaining pages; to help us cross the finish line and make this trove of historical legal research and analysis available to new users. Now, I'll turn it back over to Stephen to close us out for the webinar. >> Stephen Mayeaux: Thanks Michael. So, now that you have a sense of what we've done over the last couple years, I'd like to just take a minute to preview what we have to look forward to in the near future. As of now, we have approximately 1500 more born digital reports that we plan to release over the new two fiscal years, and we also have several hundred more paper reports to digitize, including some that may need to be added to a future phase of the ongoing crowd sourcing campaign that Michael just discussed. Also, an important thing to note is that this project focuses primarily on reports that were not already part of the library's known collection, and even with the future publishing of the reports that I mentioned, there is still many bound catalog reports that are already part of the library's print collection but which are not currently prioritized in this phase of our digital project. So, it's possible that another future phase of the project might involve digitizing those reports that are available to researchers on site but are not currently accessible to those who can't make it to Capitol Hill. And, of course, we learned over these last few years that there may in fact be historical reports offered by the Law Library that we don't know about. And so lastly, I'd like to just end with a call to our colleagues at law libraries and other academic institutions around the country. We know that some institutions may have collections that would fill in some of the gaps in our own collection. And so if you know that your collection contains historical, legal reports offered by the Law Library that may not be available elsewhere, please get in touch with us. We'll provide our contact information on the slide at the end. And this slide just summarizes where you can go to find more information about this project. The first bullet point is just another link to the online collection. You can access any of these resources by visiting the URLs listed above, or by doing a web search using the titles in quotes. Also, we've done a little bit of promotion in recent months. You can see our recent blog post by searching "Historical Legal Reports" in our blog, In Custodia Legis, and you can even listen to a recorded interview that we did just last week for the Federal News Network's Federal Drive program. So, we'll now open it up for questions from the audience. Please feel free to type your questions in the Q&A box. I see we've got one question from an individual asking about the public crowd sourcing [inaudible]. Yes. If you go to crowd.loc.gov you'll see the link there, but I'll provide a more direct link in the chat. Thanks. >> Robert Brammer: Okay; any other questions please type them in the Q&A box. >> Stephen Mayeaux: Ah; a question about where the paper docs were being stored before you started the project. This is really interesting. These were stored on site in the Madison Building in a small records closet on the second floor. So, over time they've moved to a couple new places. They've been first organized by requester, and then once they were prepared for digitization they were sent to-- to the Adams Building to be digitized by our digital scan center. So, they moved around a little bit throughout the Library of Congress campus. >> Robert Brammer: Thank you. Any other questions? Okay Stephen. It looks like you got another question in the Q&A. Thanks. >> Stephen Mayeaux: Okay; what kind of advertising do you do for the crowd sourcing campaign? What type of audience is doing crowd sourcing? This is really a good question, and it's one that I have to admit I'm not as well tuned to the different audiences that do crowd sourcing. This is the Law Library's second campaign that we've done. The first one was very different. It was a collection of Spanish legal documents from the 15th to the 19th centuries, but really we've been lucky to be able to tap into a pre-existing group of individuals who were already very active on the By the People platform. So, and to some degree we benefit from the hard work that our colleagues have already done in the library cultivating an audience for crowd.loc.gov, but we've also had a few reports in this campaign that contain foreign language material. So, in those cases, what we have especially a few documents that are translations, foreign language translations of the U.S. Constitution; we've been asking for those who have a background in Chinese, Korean, Russian, and Armenian to help us with transcribing those documents, and we've had some luck in recruiting the volunteers with language backgrounds to help with those, but if anyone on the call has a background in a foreign language we are especially in need of transcription and review help with those documents. >> Robert Brammer: Okay; it looks like you just got six other questions. >> Stephen Mayeaux: Let's see; a question about the corporate author. I admit I will provide a link to a bib record in the catalog. I don't know exactly what the corporate author reviewed. It's most likely the Law Library or the Global Legal Research Directorate. I'll provide a catalog link to put in the chat. Hopefully that'll help. Let's see; a question from Lynn. Once all the current documents have been digitized and transcribed, do you anticipate making other documents available for transcription? Yes, really good question, and that is something that we're working on now. We are, as I think the earlier slide said; we have about 500 more that we need to review, and we have about 200 that are being digitized currently. So, it's quite likely that many of those documents will have some of the same OCR issues that Michael showed. So, we do anticipate in future, perhaps in another phase of crowd sourcing making those available. Is there a background note on the authors of the reports? No. In many cases, many of these reports were written by authors who worked at the library many decades ago. So, we don't necessarily know much about the authors, but you can, thanks to the bib records, you can search for all reports written by a particular author. But at the time, at this time we don't have much background information on the authors. There is a little bit of information for current authors on our website. So, if you go to law.gov there is a little bit of information about many of our current foreign legal specialists. Let's see; are you transcribing foreign language documents that the Law Library published? Yes. So, in this round of crowd sourcing there are a few dozen-- There are a few dozen reports and translations that contain foreign language information, and we are transcribing and reviewing those. >> Jonathan Donovan: Hey; this is Jonathan. I'll just jump in and say that of the documents in the crowd sourcing project so far that are fully non-English language, a lot of them are translations of the U.S. Constitution into a different language. We have other reports that have, you know will contain text in a different language, but those are kind of the-- I would say that's our main set of reports that contain non-English text is those constitutional translations. >> Stephen Mayeaux: Oh, and I see someone very helpfully provided an example of a record. Yes, that is an example of a record for a 1976 report that we digitized and published. I believe this is one that is in our collection, but I'll provide another one. Here's an example. Let's see. So, here's another example in the chat. The documents that are in English only, do you plan to, at some point in the future, have someone transcribe those documents into other languages, such as Spanish? Other languages such as Spanish. No. Currently the crowd sourcing program is mostly a transcription. It is a transcription program. So, there isn't a translation aspect to the project, but that's only because that's the scope of the transcription platform. So, it's a good question, but it's not something that's currently part of our plan. Question about the privacy procedure for notifying original requesters. Will you do this for all documents? Okay, good yes. So, for every report where we are able to notify the original requester we do. If we can find them through, through current office holder or we collaborate with other organizations that can help us get in contact. For ceased requesters, in those cases, the discretion is ultimately up to the Directorate of Legal Research. So, when we cannot locate the original requester then it falls to our research directorate to make a final determination. And if permission is given, how do you mask the requester? That is-- That's a good question. That's something that we can do usually by simply removing any information on the cover of the page that may have indicated who the requester was, and then in most cases, actually, the reports are written without any reference to the requester. So, it's not-- There's not a whole lot of work for us to do, but we do a round of review just to make sure that that's the case. What about links in documents? We, in many cases, the links may have expired. In more current contemporary reports, we do use Perma.cc links, but for those maybe from the early 2000, early 2000s, especially the born digital reports, some of those links are no longer active. That's a good question. >> Robert Brammer: Okay; thank you for joining us. I want to mention that next month on July 15th at 11am we will be hosting a webinar to discuss the Constitution annotated. This is a document by the Congressional Research Service that summarizes U.S. constitutional and the leading U.S. Supreme Court decisions that interpret them. Please visit the Law Library of Congress' Legal Research Institute at law.gov to register for this and other upcoming webinars, and thanks again for joining us.