>> From the Library of Congress in Washington D.C. >> You are here to hear about digital preservation and the Library of Congress. More specifically, what we're going to be hearing about today is an introduction to the Library of Congress levels of digital preservation. And we have two super speakers lined up for you this morning. So I'll introduce them right now. I'll introduce them both now. And then they'll take turns speaking, okay. Our first speaker will be Moryma Aydelott. >> Aydelott. >> Aydelott. I kill every name. We'll just refer to her as Moryma in the future. Moryma came to the Library of Congress in 2002. She began work in the Office of Network Development and is now the Special Assistant to the Director for Preservation. But as she was quick to remind me, she has also done stints in other parts of the Library as well. She's a really lovely person. We also have Erin Engle with us. And Erin is in the Office of Strategic Initiatives. She's with the NDIIPP Program, which is the National Digital Information and Infrastructure Preservation Program. Yes. I take the rest of the day off with that one. Erin came to the Library in 2007, and she's been working diligently on our behalf since then. >> So thanks for the nice introduction and thanks to the Digital Futures and You Program, especially Angela and Judith for having us here. As Blane said when I started in 2007, I'd never worked at an institution this large before, so I thought it was really cool that there were all these programs during the day for staff to come to. And one of the first ones I went to was the Digital Futures and You Program about LibraryThing. So I always remember that being my first one. So when we got this invitation in our in boxes, I was really excited to be here. So thanks a lot. So we'll just get started. So Moryma and I are going to set some context and first talk about what digital materials are here at LC and how it relates to the information we're going to be giving you. We'll talk about why digital preservation is important and why you should care about it as it relates to these materials we're going to be talking about. Then we'll go into some background about the Digital Preservation Working Group, who this group is, why it formed, what it's done, including talking about some of the resources this group has created. And we'll be spending a lot of time actually talking about one of those resources as Blane mentioned, the LC levels of digital preservation. What the levels are, how to use them, more specifically we'll walk you through one of those uses as a self-assessment. While most of this presentation is going to be focused on the levels, I'd just like to say that some of the other resources will be highlighted and may be useful in other areas of preserving digital collections. So we're not going to touch on all aspects of the digital lifecycle today, so topics like digitation or access to materials for patrons will now be discussed. >> Great. So let's talk about digital collections here at the Library. While they're still relatively new as a format that we collect, digital materials are a diverse, large and rapidly expanding type of collections that we have and we'll continue to see around the Library. So there are a lot of different ways to slice them. But for the preservation and workflow discussion that we're having today, it's helpful to look at digital collections as belonging to three major categories. Tangible media, reformatted analog collections, and born digital. So these are tangible media here. Digital collection items stored on something you can hold in your hand. They can come to the Library on their own, like these CDs and DVDs you see stacked there on the slide. They can be part of a package like that floppy disk that was included with the book. They can be external hard drives holding terabytes of content, thumb drives holding megabytes. Essentially, if you can hold it in your hand, and it holds digital data, it's tangible media. In a survey conducted last year of library curatorial divisions, chiefs reported hundreds of thousands of CDs and DVDs, thousands of floppy disks in all sizes, and a steady stream of hard drives being acquired and received by Library Services divisions. They also talked about older, more exotic media, like laser disks, decades old computers and digital audio tape. So why is this important? Every single one of these tangible media types requires a different kind of hardware and software to access the content on it. Some are easy to find, like the CD drives or the USB ports on your work stations. But some of those other drives require a lot more effort. So next we have digitized analog materials. Historically these are the majority of the Library's digital holdings. They can be preservation surrogates made to provide access to and reduce the need to handle brittle or deteriorating materials. Or they can be access copies made to share content with users who are here at the Library to see them in person. A lot of these digitized analog materials are up on the Library's website. So what's important about these? Well, if we digitized them we have some say on the way that the files are created, how they're named, and the kind of metadata that comes with them. This is a huge advantage compared with files that we received from other organizations where we may have to just take what we get and adjust the files or our technical systems in order to interact with them. And then there is digital materials coming in to the network. These can be attachments to emails, pages pulled down from the Internet or FTP sites, or materials like e-books or e-serials received through formal programs like e-deposit or ECIP. So what's important about these? Because we don't have an original analog version, if the digital version received is damaged, and we don't have a backup, well we have a big problem. But it's also important for us when we do receive these materials to have information from the sender on what the files should contain so that we can confirm that what we received was complete. So overall, what does this mean for the Library's collections? That many divisions already have a culmination of these three types of materials and are interested in workflows and technologies that would make it easier for them to receive more of them. So this is really an immediate concern. Even though we hope to be getting more of these in the future. Dealing with what we have right now is an immediate concern. And so now that we have a baseline of what digital collections are referring to, let's meet a group of people who have been developing preservation resources for the Library. So here are some of those folks who have been developing these preservation resources. >> The Digital Preservation Working Group or the DPWG. We were formed in 2010 to address a digital preservation annual objective and have since then continued our work to research, analyze, document, recommend and share best practices and guidelines. We consist of a core group, that meets mostly weekly, and a larger group that meets for presentations and to evaluate documents and to provide feedback. The core group, which you see listed here, consists primarily of staff from the preservation directorate and OSI with representatives from CRS and the Law Library. The larger group has staff in curatorial divisions from around the Library. So why does this group exist? Well it exists because digital preservation is important. As you can see from this definition from the DPWG glossary, digital preservation is a series of managed activities, policies, strategies and actions to ensure that accurate rendering of digital content for as long as necessary, regardless of the challenges of media failure and technological change. As Moryma highlighted, the Library is acquiring and creating this digital content every day, which means that in your division or in your unit you are receiving or have received these digital assets, and they are in your custody. One of the points of today's session is to help you think about your digital assets and assess the steps that your division is currently undertaking to ensure that we can preserve these assets right now for as long as necessary. So let's talk about some of the documentation tools developed by this group. So over the last four years, the DPWG has compiled and reviewed internal and external policies, relevant standards, and current workflow practices around the library to inform the development of guidelines for the preservation of LC's digital materials. I'll talk more in a minute about these guidelines we've developed, but they all share one important point. They address preservation actions for the majority of digital holdings here at LC, whether they are converted, reformatted or created as born digital. While there are some library digital collections that could require additional or alternate preservation actions, the guidance and resources produced by the DPWG are a solid starting point for the vast majority of digital holdings. So here's a brief rundown of the work that's relevant for this discussion. So as I mentioned starting in 2010 we started doing background research, literature reviews, environmental scan of over 100 digital preservation resources, including resources here at LC, like the LC process descriptions for the digital lifecycle, the federal agency's digitization guidelines and initiative work, the sustainable digital format's website, and tools currently in use to perform related work. We also reviewed external policy guidelines, strategies, excuse me, of other national libraries and organizations. And all of this documentation can be found on the DPWG wiki you see here. In 2012 we performed informational interviews with LC staff and subject matter experts working with digital collections across the divisions. This is to gain insight into current practices and processes. So we asked some questions like what kinds of content are presently being added to the collections? How is this content acquired? And do you have any established workflows that you're using? This knowledge sharing was extremely helpful and provided invaluable input that informed and drove the development of a detailed guidelines workflow document. This document informed the development of future work products. So I briefly touched on this earlier, but it's something that people in this room might be interested in right now is that we have a DPWG glossary. So really in an effort to standardize terms used in the documents developed, we developed this glossary which can be found online here. And thanks to the work of some of our Junior Fellows and refined by the group, we examined the definitions of common terms found in our documents from 48 different sources. Some terms like bad bit preservation, checksums and fixity are familiar in the digital preservation community but perhaps not familiar to those working across any aspect of the digital lifecycle. Moving on to more of those resources, in 2013, this was our year of focusing on providing practical guidance to staff. We went on down our guidelines and workflow outline document. It was a 28-page document, and it produced the essential digital preservation activities as the Library of Congress document. This six-page document is a concise discussion of what's included in the larger document, and it summarizes the key digital preservation actions necessary to ensure base-level stewardship of LC's digital collections. As we continue to focus on providing some practical steps to help staff manage digital collections, we became aware of a new resource that the National Digital Stewardship Alliance Community developed, the NDSA, levels of digital preservation. We really like this document, which we'll talk about later. And we started adapting it for LC practice tools and workflows. In 2014, and we continued, we contributed to the draft Preservation Handbook and Technical Guidance, which is part of the Library's ongoing governance document's effort. This is a detailed description of digital preservation, planning, policy and practices here at LC. So all of the resources I just mentioned and many others, contributed and informed this guidance. And in particular, the levels are an integral part of the technical guidance. This year in going forward we plan to continue sharing the work of this group in sessions like this and also training on using the guidance. Okay, just to circle back around and sort of hammer this point in, it's why do we have all of these resources? And really the purpose is to provide all of you with core digital preservation concepts so that as an institution we have a common understanding of actions that are integral to performing this work. These resources are important to support the LC community of practice for preserving digital collections. They also inform and aid staff in performing baseline actions. It's the equivalent of getting the boxes of the floor, but for digital materials. And finally these resources promote a common vocabulary so that staff can have a shared understanding with respect to laying out reasons for and communicating about the preservation of LC's digital materials. >> Great, so all of those resources are valid. I encourage you to take a look. But today we're going to focus on the levels of digital preservation. It's available at the link you see here on the slide. And as you can see, it's a one-page chart with a lot of content on it that we're going to walk through step-by-step. So the chart is structured with four levels, which are the columns, and five functional categories of digital preservation actions, which are the rows. So by describing the steps to be taken across the five categories moving from level to level, the chart shows the work needed to achieve increasing levels of preservation or, as NDIIPP colleagues like to call it, leveling up. So, how do levels progress? Prepare all the way on the left is different than all the rest of the levels on this chart. Because it deals with steps to be taken before digital materials are saved on the Library's servers. This includes things like ensuring that tangible media are kept in good environmental conditions and that there's bibliographic information for items. Level 1 details basic steps that offer bit level preservation. So what is bit-level preservation? It just means that the files remain the same as when you began preservation work on them. It doesn't guarantee that you'll be able to open and read them, because that depends on having the correct software already and being able, them being readable before you begin to preserve them. But it does mean that they'll stay the same, and they won't be any more corrupted than when they got here, which isn't nothing. So as you move across to Levels 2 and 3, you move more towards long-term usability. This involves taking steps, not only to maintain the files, but to ensure that they remain readable and usable. This can involve migrating files into formats that we can support long term, developing useful metadata, and maintaining the hardware, software and operating systems necessary to make those files readable. So digital preservation is like analog preservation in that there are very real resource costs to carrying out these actions. Moving across those levels is not an immediate given for all materials. And while any item in the permanent collection should have a basic level of preservation, when looking at making items additionally usable and accessible, we need to factor in staff, financial and technology resources required to do the work. So Erin and I are going to walk you through each of these five categories -- so storage, file fixity and data integrity, information security, metadata and format. And we'll go through them at a high level so that you can develop just sort of a baseline understanding of the information on the chart. >> Okay, so we're going to start off with storage. So what do we mean by storage? Well here at the Library it refers to the Library's managed storage. This includes server space, web or access servers, or tape servers that ITS manages for the divisions. So, the staff in your division have at some point requested server space for a digital content. And depending on the size of your content, how often you need to access it, if you need to access it for processing, there are different servers on which your data is stored. But it's basically what we mean when we're talking about storage and the levels document. So, reading from left to right, we'll start in the first cell here. And this is prepare. There is only one requirement in this first cell. Store physical items in appropriate conditions. This refers to any physical items that are containers of digital content. The tangible media that Moryma talked about earlier. The CDs, DVDs, hard drives. At the simplest level you can ensure that you've taken the proper measures to store and prepare the media before the digital content is transferred to a Library-managed server. So moving left to right in the cells, or moving up the levels, there are additional requirements. Level 1 instructs to make an exact copy of the content. Use red blockers as appropriate and save it on a long-term storage system. This ensures that getting content off heterogeneous media avoids bit rotting, which is the decaying of the files that's over time, obsolescence of the media, and risk of the only copy being on physical media. If you look at the requirements in levels 2 and 3, you'll notice that they expand upon what's in level 1. Having another copy of the content and then storing at least one of those copies in a separate geographic location with a different disaster threat. >> Great, so fixity in the preservation sense is being able to prove that a digital file has remained unchanged or fixed. So this is one of the most essential components of digital preservation. In practice, this is done by software that scans all of the files, bits, and produces checks signs or hashes, which is a string of unique numbers or characters that represent the number, type and order of the individual bits in the file. This unique checksum or hash is recorded, and then at a later date you can rerun that software, create a hash and compare those two. So if the numbers have not changed, you know that the file hasn't changed. And that checking, that generating of a current checksum and comparing it to the original is what is known as fixity checking. So first in prepare, you're again working with any physical media that contains digital content. And here you check it, you know, open it up, check the files to ensure that they're not corrupted. Moving up the levels you perform other actions. Starting with creating or checking file fixity on ingest on to library servers. Ingest or transfer to library servers so that you can have a baseline checksum to compare to later. And then of course you check that checksum again after you've done any actions. Those actions can be things like talking to a new location. And while it isn't the same as fixity, we include malware scanning here. Since malware can corrupt other files. And in cases where the file has been corrupted due to errors during transfer or bit rot like Erin mentioned over time. Some of the later levels call for the repair or replacement of those files. >> So the next category is information security. And this focuses primarily on understanding who has access to the content, who can perform actions on the content, and enforcing the rules and restrictions of access. As Library employees, contractors and interns, we all know that we're required to take the information security awareness course if we use any Library of Congress IT systems. So we know how important this topic really is. So at the prepare level, or at the prepare cell, it's pretty simple. Identify who has access to your content on physical media and on division servers. So are physical media, such as CDs or hard drives, stored in a secured environment in your division? Does your digital content reside on this shared drive where only the appropriate employees have access to it? Confirming these actions with a yes ensures that you have achieved the prepare level. So thinking progressively about information security, levels 1, 2, and 3 have additional requirements. For example, level 1 goes further to enforce who has read, write and edit access to the digital content. Which at this point is residing on a Library-managed storage system. And level 2 recommends performing a scheduled review of individuals and groups who have those levels of access. Level 3 recommends keeping logs of actions and performing audits to the logs to ensure compliance. >> Okay so the metadata that we're talking about in the levels is really a broad category including types of metadata that staff already routinely create. Types that are already generated by technical systems, and types that may be needed only rarely. So the prepare level calls for bibliographic metadata for the item, whether in the form of a catalog record or finding aid. And the most critical part about this is that the record be stored in an institutional system of record, like the ILS or eCO, rather than a local system restricted to a division. So we want to make sure that data is somewhere where a number of people, you know, the appropriate people, but a number of people can get to it. Level 1 calls for inventory and event log data to be automatically created. Well, it calls for inventory and event log data, which is automatically created. When you inventory the item in the Content Transfer Services or CTS, which we'll talk to you a little bit more later. It also calls for data that's critical in providing access to the content. In particular, system requirement data that allows you to know what kind of software is needed to open and work with the files and access rights data to clue people in on by who and where the data can be viewed. So as you go across levels 2 and 3, additional layers of metadata are talked about, prominence and other types as needed. And of course inventory records for each new copy. >> So on to the final category format. So in case of the LC levels document, we refer to the physical and the digital file format. So for the physical formats we're referring to the tangible media again, the CDs, DVDs, floppy disks, and other media or containers on which the digital content resides. For file formats, we're referring to the actual digital file container, the format of that file may either be proprietary or like in MS Office format or open format like plain text. So the levels for format start at the very basic level and prepare, encouraging the use of illuminates set of physical formats and inventorying and monitoring physical format for obsolescence issues. You may have little control over the physical media on which you receive the content, but at least knowing what you've got and what you're receiving is part of this level. And that'll prepare you for level 1 or 2 or 3. Level 1 suggests and encourages the use of a limited set of known or open file formats and creating an inventory of formats. This is a case for example in digitizing materials where you have a say in what formats to acquire or select for your reformatted materials. With each successive level, you'll notice additional recommendations for knowing what file formats are in your custody to monitoring obsolescence, to performing migration activities on the content when needed. >> Great, so we just gave you some exhaustive detail, with a lot of vocabulary words. Sorry about that. We should have given you a glossary before coming in. But you want to know how can these levels be used, right? So really it's a self-assessment or planning tool to evaluate workflows and practices in your division. So why would you want to do a self-assessment? In your division or unit, you probably have some digital assets in your custody. Doing a self-assessment and evaluating where your collections are in terms of preservation is valuable. To make sure that you're confident in or at least aware of the steps that the division is currently taken to ensure that we can preserve those assets. So, you can use that self-assessment then, not only to kind of get a sense of where you are, but then for further planning. So helping to drive and inform staffing and resource discussions, or as a way to communicate with technical staff for new workflows or system enhancements that you're interested in. Or even the system access you're interested in. So what can that assessment look like? Here's an example. So this is a self-assessment coded for quick visualization or a stoplight approach of what you got. So the greens are good. The yellows are okay but could be better. And the reds are just not so great. So this is what a self-assessment would look like if you're copying your content on to a shared drive for the division and just storing it there. So this could be a situation where staff have received content as part of an email attachment or where you copied, you had a CD that you copied data off of. Or perhaps content that you pulled off of an FTP site that you then just copied to a shared drive. Making a backup even on a shared drive is something. And as you can see with all the red and the little bit of green and yellow, there are a lot of core digital preservation actions that just aren't being met here. So the only preservation assurance you have when you copy on to a shared drive is that the physical items if they exist are stored properly. That the files you copied, assuming that your shared drive has the appropriate access, are stored somewhere where only authorized people can get to them. And that you should probably have the ability to open them for the time being. So, that means that you can't confirm. You have no way of tracking what's happening to these items over time. You can't confirm that they haven't changed over time. You have an extra copy, but you're limited on what you really know about these materials and what you can prove has happened to them over time. So without doing something more than dragging and dropping on to a server, you can't check off and confirm the storage actions in levels 1, 2 and 3, copying to long-term storage, and having multiple copies stored. And the same goes for all those other categories. So the fixity check, maintaining a log of events or monitoring that the file formats remain accessible over time. So how can we make this more green and less red? So many of your colleagues are already doing this by using LC repository services, specifically at the content transfer services or CTS System. And there's a link here on the page. So many of you either work with CTS or have heard about it from digital content lead staff. Developed and maintained by staff in OSI's Repository Development Center, CTS is an extraordinary tool built to work with LC servers and organizations. It's used by projects and divisions ranging from the National Digital Newspaper Project to web archiving to the ITS Scan Center and custodial divisions all over the place. So data in CTS is inventoried, backed up. And because it's in that system, you can both carry out a bunch of useful functions like copying or validating your digital files through that system. So it reduces kind of the number of server accounts that you have to manage. And it allows you to see in one place when the materials got in, what's been done to them, who did those things to them, etc. So, yeah CTS maintains a record of preservation actions taken on that digital content so you can know when actions were taken. So if you're interested in using CTS, so let's say you had been just copying stuff on the shared drive, and you're interested in using CTS, what kind of effect would that have on that self-assessment that we saw earlier? So guys you can see it's actually very positive effect. And this is using the basic received content workflow, you know, used without modification except for indicating where on the server you should actually put items. And it's available straight from the CTS homepage. So, what is this kind of entail? Rather than copying your content on to a shared drive, you would copy it on to a hard drive and bring it over to the ingest stations over in ITS. And then from there, go on to CTS on to that webpage, fill out the form and let the workflow start. When it's finished processing it'll let you know if there's any problems, along the way it'll let you know. But the result is that most of the core digital preservation actions that we've gone over today are being met. So already you can see an improvement in the smaller amount of red. Not only are you ensuring that the physical item is stored properly, but that content is copied multiple times to long-term storage. The automatic processes CTS has built in ensure that fixity checks are performed and verified, malware scans happen, and that a manifest, including a list of all the files and their checksums, as well as a lot of actions performed on the data are being maintained. So this is the workflow this, received content workflow is a workflow being used by divisions throughout library services and elsewhere. So divisions like Manuscript, American Folklife Center, Geography and Map, all with the tangible media projects. This is what they've been using. It's probably different than what some folks are doing now, and it certainly has a learning curve. You know, you have to learn how to use the new software and sort of adjust your workflows accordingly. But it does become routine over time. And if you use it, I think you know, you're really able to say you're taking significant steps to preserving your digital collections. >> So what next? Next steps. Because we do like to provide that practical information here. So you've had this introduction to the levels. You've seen all the documentation we have. Where could you go from here for example? Well, you could conduct a self-assessment. But beforehand, you've really got to know what you've got in your custody. So you could do a survey of your digital collections in your divisions. And this could be very informal. But you kind of want to get an idea of the numbers and the types of physical media working with that have digital content. Or the file formats you've got or any digital content that you have residing. And that's a starting place from which you can conduct this self-assessment Moryma walked you through. So really the self-assessment is going to give you an eye toward that bit level preservation in getting the boxes off the floor for your division's digital materials. And that means you're doing something. We realize not all digital content receives the same level of treatment. But by performing an assessment, you can start to evaluate the state of that digital content. So for the prepare level for example, you can assess how the digital content on the physical media is stored, handled, described and monitored to preserve its physical format. And for level 1, facilitate its movement of the data to the Library's long-term managed storage. In this way, we can move forward as an institution to fill the gaps and for the prepare and level 1 phases. And we also encourage you, if you're not already doing so, to engage with the digital project staff in your division. There are already staff across the Library working with these materials and at various stages of the digital lifecycle, many of whom you've talked to and consulted with to inform these guidelines and the development of the LC levels. What we're really interested in doing is raising awareness and developing shared best practices across the Library for the preservation of digital content. So by talking with staff and engaging with them, it encourages everyone to follow common practices and use existing tools available like CTS. So speaking of CTS, it is in use as Moryma mentioned, and staff can actually obtain training on it now. There is already support available for current and future users. There's a bi-weekly CTS user support group meeting, which staff are encouraged to attend, learn how others are using the workflows, ask questions and get user support. And this meeting is held every other Wednesday. There is also a listserv, and staff are encouraged to join and post questions to it and reply. And this really enhances a community of users if everyone's asking questions. And everyone can see what those responses are. And of course, staff who are currently using CTS can be a great resource for other users, particularly within the divisions. So in the next few months, stay tuned for announcements of more of additional training. Right now, much of this training we've been doing in sessions like these and with CTS user support group has been really great for staff doing this work now. But we hope to help manage more formalized and standardized training sessions, tools and resources for the preservation of the Library's digital content. So here is the staff page where everything we talked about, you can find all of those documents right here. And we'd be happy to take questions. >> This is not specific to our division, but I just have a general question about how do people who are in divisions like mine Germanic and Slavic, this is very good we came to this meeting. But how do we stay connected on regular basis and know what is happening and what training is available? >> We're in the midst of developing that now. So yes we talked about specifically CTS and how there's a support group for that. But for the larger digital preservation community, there are some groups that meet to talk about it. So one of them is through the archives forum, it's called Talking About Digital Archives. Is that the one? >> Yeah. >> So that happens once a month during lunch. And so those people who are working on digital projects, or anyone really interested, can come and talk during lunch and hear about what's going on. And there's a listserv for that. But I don't think, I don't think that there's an actual place for all of these groups or resources where there's one place where you can go to see sort of the combined effect of what's going on. Unless somebody else knows about anything like that. But it's. >> So if you wanted to kind of branch it out beyond digital preservation, talk about marginal, digital collections management, there's a digital DC-3, I can never remember what it stands for. Digital Collections... >> Coordinating Committee. >> Coordinating Committee, thank you. And in that group, you know, when we were sort of looking at, and that's the group that consists of staff from Library Services, OSI, Copyright, Law, so really kind of a broad swath. In that group we were kind of looking at the various groups that exist across the library, so there's digital preservation working group. There's DC-3, and then there's tons of other groups out here that are all handling kind of different parts of this project. And part of DC-3 our communications committee is interested in sort of doing some work to kind of bring together some of those groups and share information across those groups. So I think you're right, we haven't had a lot of really great ways to get people informed, especially people who aren't doing this work day in and day out. But between the Digital Preservation Working Group and DC-3, we can acknowledge that's a real issue and are looking to make changes with that over time. But if you have any, you know, if you're interested in kind of getting a sense of some of the groups that are available, feel free to send Erin and I an email, and we can kind of try to connect you up with some of those things. >> Well what about as a follow-up to her question, you already talked about the Wiki, what if you set aside a portion of that and just put those types of resources on the Wiki? Because we already know about that one. And you know, you can find a better spot later of course. >> Well yeah. I know on the DC-3 website there's, there's links to other groups, right Joan? Is that? Yeah, so yeah we can also include that on our site, maybe a link to that side. >> Is there ever a point in which the tangible items become unnecessary? You know, you were talking about the CD ROMS and the floppy disks. Is there ever a point at which they become so old, so irrelevant to the digital preservation that they're just tossed? Or do you keep all these tangible objects in perpetuity? >> The answer is yes. We do. So take a project in the Preservation Reformatting Division, working with items in the Machine Readable Collections so that floppy disk you saw was an example. We copied the data off of the disk, then returned the floppy disk to that package and book. If somebody called that book and requested it, that floppy disk would become the access copy that they would use to go ahead and access that material, but we would maintain that long-term storage copy on a tape server. So it's a copy that doesn't get changed. Doesn't really get touched. But yes, we do not in any of this work at this point and time, we don't dispose of any of those tangible media items. They are kept. >> How often do things like fixity checks and other kinds of maintenance activities go on in the repository that are scheduled on a regular basis? >> So, as we mentioned, when there's any kind of action being taken on an item, fixity checks are done. So if the item is being touched in any way, there's a check that's happening to ensure that it remains the same. And an overall, you know, when the items are at rest, which is where they spend the majority of their time, I'm going to call in a favor and ask Dave Brunton to answer, too. >> The CTS interface has the capability now today to schedule a fixity check of files. So that's a process basically when you're looking at a file in CTS there's that set of actions over on the right-hand side. And you click on it, and you click verify. That's what the option is known as. And there's a module in CTS that allows what we call stability checking, which is a lower level of fixity checking that just checks the file sizes to make sure that nothing has changed on disk. That happens faster than a fixity check and results in some false negatives, but no false positives. So if the file change has changed, then the file has definitely changed. So it identifies certain problems. And that can be enabled on projects to run on a periodic basis. But I don't think right now the Library of Congress has an agency-wide policy for checking fixities of files at rest across the collections. >> And I thoroughly advocate the stability check as an option, especially if you're talking about hundreds of thousands of files, you know. The difference between it taking a day for you to get an answer to your question of whether or not the files have changed versus, you know, minutes or hours. So, and that's not one of those, again kind of tips and tricks that sort of come up through the CTS user group meetings. And sort of the advantage of working within a broader community is sort of to be able to exchange what kinds of tools are out there and what are the most effective uses of them and for them. >> Can you talk a little bit more about security issues and who should and should not have access to long-term storage data, metadata? And how those decisions are made. >> When we talk about long-term storage, we're talking about tape servers that are managed by ITS. And one advantage of storing things on those servers is that those servers are accessible solely through CTS. So you have to be authenticated into that system, have permission to access or work with those files before you have access to that content. So it's not like I could go ahead and look at Meg's files, unless I have access to that project. Unless I had permissions in that project I am unable to work with those. And the determination of who should have access to those long-term storage copies is really a division-by-division determination. So in some cases, it's you know, and it sort of affects kind of how you set up a project in CTS. So in some cases you have just the staff that are processing materials. Some cases you might have the chief who also. But sometimes they don't actually need to access the materials. They just need to see that the materials are there and be able to run reports on them. So it kind of, it depends on the organization and who they evaluate needs to have access to those materials. >> My name is Kit Arrington. I'm with Prints and Photographs, and my question is about tangible media again. And this is for the idea, you get some collections, you have a box. There's a something. This came up recently. Somebody said that they thought at one point ITS had a room where they were keeping stuff. I mean everyone says, oh the Smithsonian keeps everything which we know isn't true. Are we making some effort of keeping hardware through different formats through time? Or does anyone know? >> As you know, we have a lot of old stuff here like there's eight inch floppy disks. There's all kinds of things. But because we receive archival collections, we have reasonable expectations that we're going to continue to receive obsolete media years and years after it's actually written or used, right. So, we are, have a couple of things going on. First of all in the Preservation Reformatting Division, starting to collect some equipment. So we have some things, and also thanks to our colleagues in the Repository Development Center for contributing to this. But we have some floppy disk drives. So on most of our computers we have the three and a half inch, but there are some five and quarter inch floppy drives that have USB ports. So if you have those materials, you can borrow those drives or go down to PRD, talk to Adrija Henley over there. And go down to PRD and access your materials through there. There's a FRED machine, which is a forensic machine that has some right blocking capability over there. And then also in the DCWG, part of what we're working on right now is based upon that tangible media survey from last year. We have a list of the kinds of media that divisions report having. And have sort of begun to develop a list of the equipment we have and are going to start working, put a call out to the divisions to find out if in your divisions. I know some people have five and a quarter inch floppy machines that they use and whether or not they'd be willing to let somebody else use them. And also too working with our colleagues over at NAVCC. What kind of, you know, can they use their laser disk readers. And getting a sense of what's available here at the Library so that as people have materials to process, we can point them to a list that says here's where you can go to access that content. So yeah, most definitely stay tuned for that, that call for that data. >> My name is Bob Morgan, and I'm from ABA, too. And we send a lot of great material to the Machine Readable Collections Reading Room that is accompanied by tangible media such as CDs. I wonder if you could elaborate on their role in this and how they're serving this material or preserving it. >> So, when new materials come in to Machine Readable after they leave binding but prior to them getting to Machine Readable, they take a quick detour over to Preservation Reformatting where they take materials out, they make a scan of the CD or DVD or floppy disk so you can see what the label looks like. And then they copy the data off of those materials. That data is then inventoried in CTS, so there's long-term storage copies being made. Then material is returned to the envelopes. Books go over to machine readable. And a note is placed in the catalog record that a preservation copy of the digital material has been made. The access copy remains the copy that's in the copy. So if somebody wanted to look at the data, they would first go to the book. Eventually, all things must pass, the media will someday not work. Right? That's just the way that it's, maybe 10 years from now, 20 years from now, 50 years from now, and when that happens we'll have that backup copy on long-term storage and to be able to provide access to the content that way. >> Let's suppose somebody picks up that book and sees that the CD is missing and they really wanted the information off of the CD, do you have a mechanism at this point whereby the person can access the stored copy of this and have access? Because a lot of our stuff gets lost. >> So that project in Preservation Reformatting has been going on since December of last year. So it is up to date with materials received from December of last year until now. So the first part of the project was just about making sure that it could keep up with the incoming flow of data. And you're right, there's a huge variation in that, in that program in terms of where the materials come from. And there's a significant portion of the disks, the floppy disks or the CDs that come in that are from foreign countries. So yeah, you're right, there's a big variation. And that can cause some problems, right? Because these materials can be encoded. You know, sometimes they're, they look like CDs somebody bought in an office supply store. But they, you know, wrote in their hand. It's coded that way. And sometimes they're professionally printed. Sometimes, you know, they're sort of all over the place. And we often have to use a variety of tools to be able to access that data depending upon how they're encoded. So yeah, it is. >> The second part of my question was if the CD, the physical CD were lost, and the person at that particular moment, how a user would like to have access to that information. How can they get access from your backup files? >> That access would have to be mediated. First of all we'd have to make sure that we have a preservation copy of that data. So that it was lost after it was received but before that person got to it, right. So making sure that that copy exists. And if they, if we needed to provide access to that copy it'd have to be mediated, right? So they can't, we're not going to send any users to our long-term storage, but it would have to be something where they'd work with a Library staff member. You know, a reference librarian. >> Can you envision a time where on the bibliographic access to the stored copy and the patron will be able to access the digital data simply by, you know, looking at the copy. >> One of the reasons I'm thrilled to be giving a presentation on preservation is that access is not a huge part of it, it's not a main component. You know our goal in this presentation is to create those preservation copies so that however access is being given to them, whether it's through a reference librarian, through a direct link, however in the future it's being done, that that preservation copy exists. So, that's a bigger question. >> Thank you. >> The CTS system, what is the general length of time? If I put a project through that it comes back saying yes, you're project is now preserved. And yes, the fixity is correct? And the second part is, if Captain CTS of division X puts a project in there and then that person retires, resigns or whatever, that that division is notified that you still have this out there in preservation land and don't forget that you still have it. Is that mechanism in place? >> Sure, first of all the answer to how long it takes is it depends. It depends on a lot of things. It depends on the size of what you're pulling through. So a lot of the work that I've done is with CDs. A 2 gigabyte CD, if there's no other things going through the system, will take seconds to process, right. To get that preservation copy made. If you have a huge hard drive and it's the end of the month and a lot of other people put their huge hard drives into the queue, it can take days, you know, to go through. It just really depends on what else is happening in the system. The size of your item. Not just the size of the individual files, but the number of files. So you may have a relatively small digital item. But if it contains 500,000 XML files, it's going to take a long time to process. So I mean, it's not a super helpful answer, but it's the truth. It depends. Anything on, I think that's, yeah. And then the second part is if somebody retires, what I'll say is you're better off having your project in CTS than you are not, right. So currently if someone retires, they're put stuff on the server. Nobody knows the exact order that they put it in. Or maybe people don't know where it's all located. You have to deal with all these different accounts. It's a mess already to deal with. At least if your stuff is in CTS, hopefully you would have more than one person being a part of a project. So kind of what Angela was talking about. But then also, other people can look and they can say, okay, for this division, here are the materials. This is where they're all located. This is what's happened to all of them. It's actually much easier to follow up. But if you're interested in, you know, the CTS meetings are open across the board. So, even if you're not working with it right now but you're interested in kind of hearing about it, those meetings do exist. And I would encourage you to kind of just get familiar with the topic, even if it's not part of, I hope I'm not starting some sort of war in your division. But, you know, I think it's just, it's useful for all people to kind of be aware of these tools. Yeah. >> Thank you Moryma and Erin. When they, just before they began they said they were a little nervous because they only had enough material for 40 minutes. And what were we going to do? And you can see we've used up almost all of the time because of the interest in the topic. So it worked perfectly. So, thank you Moryma. Thank you, Erin. Thank you to the organizers of this event. Angela Kinney told me that this series has been going on since 2001, and that there are about 100 programs to-date. So it's a well-received, long lasting program. So Angela and Judith and all of you who make these programs possible, thank you very, very much. >> This has been a presentation of the Library of Congress. Visit us at loc.gov.