Deanna Marcum: Good evening. I am Deanna Marcum, the Associate Librarian of Congress and on behalf of James Billington, the Librarian, and my colleagues at the Library of Congress, we are pleased to welcome you to the second speech in the series "The Library in the Digital Context." We have two audiences for this program tonight. One audience is here, we're delighted to have you here, the other audience is at home watching this broadcast by C-SPAN, and we're going to try to involve both audiences in the program, so we'll talk about that in just one minute. This is a joint project of the Kluge Center here at the Library of Congress and the National Library. It has been a great pleasure to work with Dr. Derrick de Kerckhove, my colleague who is the holder of the Papamarkou Chair in the Kluge Center, and together we've tried to think about the Library as it is being transformed and how it will look in the 21st century, and we're inviting scholars and great thinkers to this series to help us puzzle through these questions. Last month we had David Weinberger in our series; tonight we are honored to have a very thoughtful, provocative, and delightful speaker, Brewster Kahle, who will be introduced in just a moment. We also have two panelists who will talk about their views of how the library is changing and what needs to be done in the digital environment. We have Abby Smith, who is the Director of Programs at the Council on Library and Information Resources or CLIR, as we fondly call it, and Robert Martin, the Director of the Institute of Museum and Library Services. Both have been at the forefront of the transformation of libraries, and we know that their comments, though brief, will be incredibly important for our discussion this evening. So let me talk just a moment about the rules we need to follow tonight so that we can have the broadest possible discussion. Brewster will speak to us, we'll hear from our panelists, and then we will try to leave as much time in this program for questions both from the audience here and the audience at home. We're going to ask people to keep their questions brief, and we're going to limit questions to one question per person, so that there will be a lot of give and take. Viewers at home are encouraged to send their questions by email, and we will take those questions and cover as many of them as we can. So with that as our context I'm now pleased to introduce Dr. de Kerckhove, who will introduce our speaker. Thank you. Derrick de Kerckhove: Thank you Deanna. I also want to thank the Papamarkou Foundation and Papamarkou Chair for allowing this series to occur. I'm also very happy to continue the good start we had David Weinberger with Brewster Kahle, who just reading about what he has done amazes me in the same way that I was amazed by David. He is very much the person with the widest vision I think and the more focused understanding, more precise way of dealing with that vision that I have ever met. He's been obsessed with the archiving of the Internet and the title of this evening's talk, which is "Universal Access to All Knowledge." He's been obsessed with this for about 20 years and has managed to do, in between, quite amazing things including creating the first Internet publishing system called the Wide Area Information Server, which he sold to America Online, and if that wasn't enough he also developed one of the principal technologies Alexa, that was actually used now by www.amazon.com. For me one of the most amazing things is that he was one of the principal engineers in the development of the thinking machine. I don't know how many of you remember what the thinking machine is, but it was the first really performing super-computer, and to think that I know somebody who's been touching it quite an amazing experience. [laughs] So we're going to hear, and some of you may have already heard Brewster Kahle in this Library, but it's a great privilege for me to listen to you and here's Brewster. [applause] Brewster Kahle: Thank you. Thank you very much. I really appreciate the opportunity to be here. This is a great time to be in the Library of Congress or in this whole field, people that are dealing with these massive amounts of information, we are given this sort of opportunity based on the technologies that are available to us to do some amazing things. The disc drives, the computer guys, the computer networks have made it so that we can do something that only one other time in history did really anybody try to attempt and it was in the Library of Alexandria. They attempted to get a copy of all the books, of all the peoples of the world, tried to think of everything, tried to pull it all together into the Library of Alexandria. By some scholars standards they got 75% of the way there, so it's sort of amazing and they rotated around a technology change, which was going from clay tablets to papyrus. Clay tablets are kind of heavy and kind of clunky, but papyrus is great, you can roll it and you can actually get a long way there. We now have a technology change which allows us to talk about doing the whole thing all over again. I think we have the opportunity to do one step better, not just make it happen in one place, whether it's in Washington, D.C. or in Alexandria, Egypt but to then make that information available to people all over the world again. So the idea that we have the opportunity for universal access to all knowledge I'd say is a fantastic goal, and it's what I'm going to try to argue, through the next 30-40 minutes, is that this idea of universal access to all knowledge is within our grasp. So if I'm successful then you'll actually come away saying "Well I don't know, yea we can actually do that." I'd say that you may wake up differently in the morning and say, "Yea, let's give that one a shot," or, "I'll take some piece of it." So to try to go through that point I'm going to try to answer four questions. One; should we try to do such a thing? Another is can we? Is it technologically feasible to be able to pull this sort of thing off? May we? Societally, legally - are we allowed to do this? Copyright and stuff, is this going to break everything? And the last, which I'm not going to be able to help with very much, but it's our exercise to the reader, is will we? That will take more than myself or a small group of people to actually try to have this all come true. I'm going to try and answer the first one, should we, with an observation; that we as parents, librarians, educators a big part of our role in life is to put the best we have to offer within reach of our children. Right? We're supposed to do as much as we can so that the next generation can build on the best of the stuff that we have to offer. Another observation is that kids these days, and I'd say a lot of us as well, turn to the Internet as our information resource of first resort and often, our only resort. We're not going back to the library. It's as if, if it's not on the Internet it's as if it doesn't exist. So then you have to ask is the best we have offer available on the Internet? And I'd say "No, there's a lot of stuff out there, but not the best we have to offer." A lot of the best we have to offer is locked up in libraries and archives, and we have the opportunity to make it available, but it's not available yet. So should we? Yes. We owe it to the next generation to give them the library that we grew up with. We grew up with a library that had out-of-copyright and out-of-print materials in these places. Okay, it was a little slow to go through them, now the Internet's faster, but now we need to broaden what it is out there and make it much more available, and we have the opportunity to do so. So should we? I'm going to answer, "Yes." Now I'm going to answer a much easier thing to actually try to justify is "Can we?" The way I'm going to try and do this is by going through the different works that humankind has produced, and I'm going to stay focused on the published works of humankind; books, music, video, software, Web pages, and just go through these one at a time and try to say, "Can we actually do this and what are some of the societal issues, the legal issues, and cultural issues around making these things broadly available." So let's first start with texts. So since the Sumerian tablets up through the current printed documents. Well let's try to answer the question "Can we actually do all of that stuff? Could we have the Library of Congress available to every kid, anywhere in the world?" Well, let's try to figure it out. So the Library of Congress, I'm told, has approximately 28 million books in it. There's a 130 million things in it, but about 28 million books. A book, if you just take the words in the book, is about a megabyte, so there's 28 million mega-bytes of words in Library of Congress. If you are to take those words and put them on computer disks and put them in computers, it would take up a shelf that's about this big and cost about $60,000. So for $60,000 you could store all the words in the Library of Congress on disc in such a way that you could search through it and serve it. So, you could have the Library of Congress for about the cost of a house, or in San Francisco, which where're I'm from, it's more like a garage or a small patch in the back. [laughter] But anyway. So $60,000 is sort of what it costs to actually store that much material. But first we have to actually get it there, and if you take the images of the pages then it's more, it's more voluminous, but it depends on what resolution you use and the like, but all in all it's still quite doable. It might be a few more of these book shelves to be able to store it, but it's doable. But now we first have to get it all online so the question is, "How do you do that?" We're involved in a couple mass volume digitization projects. One is called "The Million Book Project." It's a joint project between Carnegie-Mellon, the Internet Archive, the Government of India, and the Government of China, and it's to digitize a million books. This scanner, this gentleman actually happens to be in the Library of Alexandria in Egypt, where they're scanning their books on these planetary scanners. So you don't destroy the book, it scans it from up top, and it costs about $10.00 a book to scan a book, to digitize it in terms of taking images of the pages, putting it on the computer, and doing optical character recognition so you can search and try to find where you want to go in the book. So the cost is about $10.00 a book, and if we're talking 280 million, there's 28 million books in the Library of Congress times $10.00 is $280 million, one-time, and you could digitize all the books in the Library of Congress by this calculation. Cool. $280 million dollars. That's about six months of the budget of the Library of Congress, and it's only a few percent if you take the total cost of all the libraries in the United States over a year. We could, one-time shot, could do that kind of a scale project. So it's actually possible to scan it if you're sending things to these other countries, but we've also started doing a project to do things in-library, so doing things within the United States and Canada. There's a project that's now going on with the University of Toronto to use more advanced scanner technology to make it so there's lower labor cost to be able to scan a book. We can scan a book in about 30 minutes, a 300-page book, using this kind of a robotic page-turning system and take images of all of the pages and get them into the computer. So it's possible to go through these without destroying the books and be able to scan these large quantities. Then the question is "What can you do with it afterwards?" The current technologies now allow you to not only read the text that was in the book, but also see the images of the pages, and you can search and find phrases within a book and be able to pop to the right page and see the image of the book on that page, bolded. So in this particular case we're looking at a book on English embroidered bindings and we searched on Historia Ecclesiastica, and it found this phrase even though it was in a caption of a picture and pulled up the images and bolded it, so you can then flip through the multiple occurrences, and this can offer wonderful new access to these materials. Actually tonight I'm happy to announce that a large number of libraries from five countries are all working together to put their digital books into a common repository so that you can start to search these things in tandem. So libraries in Canada, India, China, the United States, and Egypt are working together to build these new libraries that will be accessible to any person anywhere in the world that can have access to an Internet connection, to be able to search and find books. But we find that going and just looking at books online isn't good enough. A lot of people want to still read a book as a book and I have to say, "Well, I'm one of them." So we said, "How hard would it be to actually go and print it back out again?" So we made a bookmobile out of van, put a satellite dish on the top of it, just a sort of a small van with a printer, a binder, and a cutter in it and we tried out what happens if you have a bookmobile, but a modern bookmobile that has a million books in it, and you can go and drive it up in front of a school or a library and be able to have somebody figure out what book they want using a laptop, hit "print", "bind" and walk away with a book. It turns out that it's possible, and it's cheap. So our first bookmobile cost us $15,000 including the car. Okay, it wasn't a new car, but it worked great and then drove it across the country with my eight-year-old son making books. Now it's sort of catching on, and there are now bookmobiles in other places. A one-hundred page book costs one dollar to print and bind a book, just the part costs, the paper, the toner, the cover. That's not the labor, and it's not the capital costs, but that's the incremental costs to make a book is a buck a book, buck a book. Buck a book is cheaper than what Harvard says it costs them to loan out a book, which costs them two dollars. So if it only cost a dollar to give the book away, great you don't have to worry about getting it back, also you can have a lot more books on your virtual shelves than you can on physical shelves so you could actually have books starting to be distributed and customized for people all over. You could also print a large-print edition as easily as you can print a small-print edition. So this sort of technology is quite possible. We have also found that other people are taking up on the offer. There's another bookmobile running around the East coast. This is Eric Eldred in front of his bookmobile at Walden Pond. There are two bookmobiles now in India, out of their first 31; they're going to put one in each of the states of India. This is the launch day of the bookmobile in Alexandria, Egypt in a schoolyard. This is one of the engineers making one of the first books with a kid, and a happy kid with his book that cost all of less than a dollar to print and bind. These are Egyptian children's books that were donated by the Egyptian government for the project. We did one last one in Uganda to try running a bookmobile out in rural Uganda, and in 30 days we were able to set one of these things up, train people, and it's still running in rural Uganda. So it seems to work quite well for delivering books. Then we'd also ask, "Can we also get the authors paid something?" That's an important part of the whole ecology, and we haven't really done that part of it yet. So I asked some authors, just a random poll, unscientific, "What if we gave you a dollar a book for every one we printed up?" Almost the universal reply is, "A dollar? That'd be great. I never get a dollar for a book. When it comes down to it, it may sell for $15, but I get less than a dollar. I'll take it." [laughter] So I turn around to the librarians, "Okay, how about if it doesn't cost a dollar, it costs two dollars, one dollar for the parts cost and the other for the authors?" "Yea, okay, that seems to work." So I think we can go and make the whole thing work. We have a mechanism of going from a book, scan it, put it on computers, compress it, download it, print it, bind it, book. Book to book. We have a new mechanism of getting books into people's hands, and it all works. So I'd say at this point what we're looking to do is widen the path of getting through this process, and there're starting to be commercial players playing a bigger role in digitization, and we're thrilled about this, as long as we can go and get these collections also in repositories that allow advanced systems for going and doing learning and doing text analysis and translation research in these repositories for people. So books I would say all around are quite possible to do. Then the question is, "May we?" Is it legal? It's certainly legal to do the out-of-copyright works, and as best we can tell well over half of the books in major university libraries are out-of-copyright under the United States copyright law. So that's great, we can just go off and do those. Then there're the ones that are in-print, that are in-copyright and in-print available through Amazon or the like. Those are also fairly accessible. That's kind of in some sense taken care of, but we should continue to make progress on making sure everyone can have access to those materials as well. But I'm worried about ones that are in between, things that are in copyright but out-of-print. We call them the orphans. These are the things that are not owned by anybody. Really, you can't find [unintelligible] -- they're not commercially viable. The publishers aren't keeping them on the shelves, and going and digitizing these, the lawyers say, involves risks in terms of is it fair use? Is it not fair use? These sorts of things. Can we go and keep the out-of-print materials available to the general public through our libraries? That's a question that still going on in legislatures and judiciaries of how do we deal with the orphans in the sense that there's these large numbers of books that are caught in between, and it actually is a lot of the 20th century, so that is probably the biggest part that we've got to solve. But fortunately we have the political will in this country and actually around the world to be able to pull this off. We believe in access to information. We have the political will to pull this universal access off, that we're not constraining ourselves because of political issues or the like in general for these materials. It's just a mechanism, matter of trying to get the economics right, which is quite solvable. So, may we? We can do most of the work, and so books, I would say, are doable. Let's move on to some of the other areas, audio. So if we're going to try to put all audio on-line, let's take all the published audio that's ever been produced. Well how big is it? As best I can tell by talking to some of these archivists it's about 2 million to 3 million disks have ever been produced if you take the wax drums, the 78's, long-playing records, CDs. That's the universe of published works, and that's, again doable. Right? You can actually kind of imagine two to three million, yea we could do that. Okay, it's a few more of these bookshelves to go and put up but it's doable. How much does it cost? Well long-playing records right now are being digitized in Europe for under 10 Euros. I think we can get that down to about five or six dollars. So again, you sort of do out the multiplication, and it's tens of millions of dollars. It's a lot of money for me, but in terms of the library system it's doable, to be able to go and make these sorts of things available. Then the question of how do we make these things more available to people given, sort of, the copyright constraints, and the answer is, "area by area." So there's out-of-copy right materials as well that we can make available, but there're also a lot of people that want to have things more available, and different countries have different law structures. One area that we've been finding the most available for putting out are the things that didn't work very well in the publishing system in the first place, were the niche areas, cultural materials that really didn't benefit the creators by putting it through the publishing system. They'd like to have access. So a lot of these are audio recordings of lectures, but also folk music and the like. We've had a big success with rock and roll. Turns out the Grateful Dead started a tradition in the 1960s of allowing tape-trading: allowing people to tape their concerts and trade with other people as long as nobody's making money. As long as nobody's making money it's okay to pass tapes around, and lots and lots of rock and roll bands have copied them. So when somebody mentioned this it was an intern that was working at our organization and said, "Do you know about these tape traders? What do you think?" I said, "I don't know, let's ask the tape-traders if they'd like to be on the archive." So we went out and asked one of these mailing lists and said, "Would you guys be interested in unlimited storage, unlimited bandwidth forever, for free?" Right? "To go and store all of your concert recordings." They said, "We don't believe you. You couldn't possibly be actually offering that, you're dealing with big files and lots of them." And once they sort of got over the "Yea, they are actually serious" they said, "That would be our dream." Always a good sign when somebody says, "That would be our dream." So we went back and asked the rock and roll bands, and we've now gotten permission from over 700 rock and roll bands. We have 18,000 concerts. These are full concerts that are recorded and made available on the Internet for free, as long as nobody's making any money, and everybody's happy; the Library's happy, the band's happy, the fan's are happy, the taper's are happy. It's actually a pretty good time. So if you're interested in a bunch of music trying going to www.archive.org and downloading and enjoying legal music where everybody's happy. But just don't sell it. So we're finding that there are niches that allow us to move through this process very smoothly. Let's try moving images. Moving images come larger, and most people think of moving images as Hollywood films. Right? Classical theatrical releases, things you go to a theater to go and watch. Well, how many are there? It's been estimated there are between 100,000 and 200,000 theatrical releases of films, and a good percentage of those are Indian. But these works, some of them are still in circulation out of the 100,000 to 200,000, but actually not that many, but a lot of them are still under quite tight copyright protection. We've had donated 300 Hollywood films that happened to not be under copyright, and we have digitized those very cost effectively and made them available on the Internet archives. So Night of the Living Dead and all sorts of other movies are downloadable at high-resolution and also lower-resolution for those that don't have the bandwidth. So that type of collection is doable to go and make all movies available. It's possible within our constraints in terms of technology, not only for storing, but also for bandwidth for having these things downloaded. We've had well over a million, several million downloads of movies just in the last month, so movies are quite doable to be able to have accessible. What we're also finding is that lots of other kinds of movies than Hollywood films. There are archival films, government films, old educational films, training films, old advertisements, all sorts of different kinds of films that were done not for posterity, but they were ephemeral films and we've been working with one gentleman, Rick Prelinger who had a huge collection of this, and he sold access to it basically as stock footage. We went to him and said, "Why don't we pay for the digitization of these materials and give them away for free." He said, "That's a great idea. Oh, but I make my living by this." So he was trying to figure out how to make all of this go, and he said, "Okay, let's try it. Let's put a thousand of our films, the finest films that they have, and make them available to the public domain for download for free. It turns out that his business did better. His revenues went up 40%. He got more famous and actually he sold his collection to the Library of Congress, thank you very much. [laughter] Because more people knew of it and his types of films. So all around this has won, and lots of kids have now downloaded these and made new movies out of them, which is a fantastic thing, and often they're posting them again on the Internet Archive. So we're starting to see the full cycle. Go the Library, check things out, look at them, make something new, and then put it back in the Library. That's the cycle we'd like to see more of. Some of the other types of movies that we're starting to see are Lego films and other sorts of little odd things that come up around the Internet, where you take Lego's and take your digital camera and move them slightly and take another picture and weave it together into a little animation, put your own words to it. They're great. But anyway, they discovered the Internet Archives offer of free hosting, and those are now up there. There's lots of different government materials. There're lectures. We're getting the lectures from MIT from their OpenCourseWare Project, so basically we'll have all of the lectures of a major university available to anybody around the world. It's a fantastic thing. So we're starting to see the flood of these sorts of materials come online, and it's all quite affordable even by a small organization. So imagine what would happen as all our larger organizations start to come around as well. So another source of moving images is television. So television comes large. We think there're about 400 channels around the world of somewhat original content, and we're not collecting all of this, but the Television Archive is collecting 20 channels of television starting in about the year 2000. Hit the record button on 20 channels of television, 24-hours a day, of DVD quality. So downloading and storing for safekeeping of a really large collection of materials: Russian, Chinese, Japanese, Iraqi, Al Jazeera, BBC, NBC, CBS. These materials we try to make sure that it's kept just so there's some record of it. All of it is sitting basically in a dark archive at this point, and only one week has really ever been accessible and it's the week from September 11th to September 18, 2001, just before the first planes went in. For one week, the television news from around the world, what did the world see? It's available only via streaming to be a loaning of this material, you couldn't download and re-use them, but you could try to get an idea of the different vantage points. How did the world react? I think we're starting to become more aware that the media has a point of view and that you want to be able to see multiple points of view, and we in the library world can do this. One thing I'm proud of this we did that and launched it on October 11, 2001. The idea is you should be able to come to the library to help give a context of the world that we live in, even given current events. So in our digital relationship to the world, is the idea to be there with background material for people. So going and collecting even those sizes of materials is within our grasp, and making them available again will require basic balances within our society as to how we can balance the commercial interests with the public interests. Fortunately there's enough money around in the United States, just the library system is $12 billion a year, so that's a fair amount of money, about one-quarter to one-third of that goes to publisher's products. So that's $3 or $4 billion from the library system that fuels the publishing system, and if we were to refocus some of that monies on the digital assets I'd say we're in pretty good shape to keep everybody fairly happy, as long as the library has somewhat restricted access, but for researchers and scholars and historians, general public, I think we've always lived with the balance between publishing and libraries. So again I'd say moving images is doable. So let me hit a couple more areas; software and then the Web. Software. So how do you preserve software? It's always on these floppies. Remember those? They're awful. They were awful at the time. Have you tried playing one of those big floppies recently? They're really quite hard. As best we can tell there're about 50,000 published software titles since the dawn of the PC era, about 50,000. So it requires a couple more of these bookshelves of machines but again the storage is relatively easy, and we're starting to know even how to even play it again on emulated versions of the old Atari machines and old Commodore machines. It's actually coming out of the gaming community. Kids out there are going and saying they want to play their old games. So they're going and making a mock Commodore 64 on their IBM PC so that they could go and play these old things. So we know how to do it, and then the question is may we? The Digital Millennium and Copyright Act said that you're not allowed to break copy protection for any purpose unless we give you permission. It's really quite an astounding law, so we went to the Copyright Office at the sort of the three-year point and showed the original Lotus One, Two, Three, the original VisiCalc, the original SimCity and said, "We want to preserve these, but it looks like we're not allowed to break the copy protection on the disks. We're not allowed to." So they granted a three-year exemption so we're now allowed to basically copy the materials off of those old floppies and store them. We're not allowed to post them on the Internet or anything, but we're at least allowed to preserve them and hopefully when the next three-year point comes up they'll make it permanent so that we libraries can do our preservation job. Software is quite doable. The thing we're probably best known for though is archiving the Web. The Internet Archive started taking snapshots of the full World Wide Web, the public Web in 1996. The idea is to visit every Web site, go to the homepage and click through all of the links and click, click, click, click automatically with these robots to go and find every page on every Web site. Turns out there're a lot of pages, and there're a lot Web sites, but they have these computers that run along and do this all day long just like the search engines do. It's how they do it, they have Web crawlers that go out there and crawl the World Wide Web. We've been doing a full snapshot every six months, and it's starting to get big. We collect about 25 terabytes a month so that's about a Library of Congress a month, is the amount of information that comes in, about 50 million Web sites is what we collect in these full snapshots. So it's getting to be quite large. We also now work with curators to go and do deep collections to make sure that we have things that are really good. We started with the 1996 Presidential election in partnership with the Smithsonian. Then in 2000 we did it with the Library of Congress. In 2004, we did the Iraqi war, the Presidential election, and different special events, and we're working with libraries now around the world to do deep collections of the Web as well as these broad snapshots. It's now being used as part of the Wayback Machine. So you can go to the Archive web site, type in the URL, get a list of all the past versions and surf the web as it was. You can actually click into Yahoo in 1996 and probe around the Web as it used to be. Right? It's out-of-print Web pages, and it's being used by about 200,000 people a day. It gets about 10 million hits a day. It's, according Alexa, the 250th most popular website. It's kind of neat that the history is that useful to people, because if you have publishing and you don't have out-of-print, you have something kind of scary because people can go back and change the past. What you really want is third parties to have copies that allow you to go back into reference what it looked like. Scholarship is now being done on the Net, and the average life of a Web page is about 100 days. At that time it's either changed or it's gone. So we're building scholarship on shifting sands so unless we have libraries that keep out-of-print materials so that when you link to something you know it's going to be there, and you know it's not going to change. That's the time that we can start to really build on the Net as a piece of our intellectual infrastructure and the Wayback Machine as part of things going on at the Library of Congress, the National Archives, and the like, is now being incorporated into their mainstream collecting activities. So other fun things to look back, this is a picture of pets.com with a little sock puppet and some of the other sort of dreamy things that happened in the early Internet days, which are kind of fun for people to go back to but mostly people go back to their own Web pages, because, well often, they lost them because they just didn't keep copies, and fortunately there are libraries to do some of that for them. How do you preserve and provide access to materials that have come in digital form? Right? We've now gone and suggested that you can actually take all of the existing published materials and put it on line. Then the question is can you hold onto it? We've all had disc crashes and things go way, and so the question is how do you preserve and then provide enduring access to these sorts of materials? Have to say we don't really know, but we're trying some things. The first thing we learned was, I guess we didn't just learn it; the Library of Alexandria was really the major example for this. It's probably most famous now for well burning. Right? The answer is don't just have one copy. Right? Take your treasures and make multiple copies and put them in multiple places, under different kinds of management so that they'll be protected in different ways. So where Europe went through a Dark Age, it wasn't true of all around the world. Over the last 7,000 years of civilization there's been a civilization that's been running at any particular time. So while Europe was in its Dark Age the Arab world and China were doing just fine. So if we had mechanisms of keeping copies in different places so in the ups and downs we have a better shot at making it through the next the Dark Age or at least our knowledge does. One idea is to build an international library system. An international library system would be made up of libraries that would be dedicated to universal access to all knowledge, that would be located in different places that would keep major scale repositories and mirror each other and keep themselves up to date, and we're prototyping this with a couple of libraries. One is the Library of Alexandria in Egypt. So there we have a copy of all of the collections from the Internet Archive in 100 terabytes of machine. If you walk in the front door of the Library of Alexandria, which is a beautiful place, I really suggest going and visiting, turn to the right you'll see the Internet Archives collections there, and they're adding to it out, and we get those copies, and they get copies from us. So the idea is large-scale data swaps between ourselves. Another is in Amsterdam, where we just kicked off a European archive. This is 100 terabytes of material, all the books music and video from the Internet Archive. 10,000 movies, 20,000 books on and on, fits in a bookshelf that this high and this deep. You can actually put that much and that dense, and this is what it looks like in Amsterdam. So the idea is to try to build these collections and try to preserve long-term preservation and access to them. It's getting popular. This is just a graph of our Internet use out there; we're now going through over a gigabyte per second. We have accesses of over 10 terabytes a day of downloaded stuff or streamed things off of our servers, just to say that it's fairly popular, and then there's how do you index this and catalog this amount of stuff? We don't really know. When you're dealing with a Web collection of 40 billion Web pages, I think Google now lists that their 8 billion, so the total collection at the Internet Archive is 40 billion pages, how do you do this? One person tried out doing a search engine with a time-base so if you search on marine life you can see it what did that mean over time, and how popular was it, and what does it correlate to, and automatic correlations? I think time-based search is going to become very important in the next generation of Web search engines. It's a little more dramatic if you search on something like Homeland Security, where it wasn't mentioned at all, and then bang it started to get mentioned all over the Internet, and what did it correlate with was other things. One thing I'm excited about is only one person, Anna Patterson, a special person, but just one person was able to go and build a new search engine without getting venture capital funding because she was using a library. A library is content resources in support of research. That's what we do, and so a person, a smart person could come in and do something new and different without having to go and raise millions of dollars to make their own library before they even start. So we'd like to see large numbers of different types of search things start to come up and put out there on the net. One last aspect of access is the actual pipes of moving the information around. In the United States we've got a bit of an issue, which is some of our DSL lines are not getting much faster. Moore's Law goes and says that you should pay the same amount and get more and more each year, but at least my DSL line just never got better for the last five years, and that period of time according to Moore's Law, it should be 10 times more but it's not. So we need some new ways if we're going to get movies to people to start to provide access, and we've done a system of wireless access points that people put on the roofs so that they actually own the networks themselves. It's built on the same wireless thing that's in your laptop, but it's on top of people's homes. So for $1,000 you can put up a post on your house, and it's a lot less intrusive than a TV antenna and actually own your connection, and it can go between 1 mile and 10 miles to your neighbor, and we have about 50 of them around San Francisco, and it's now being adopted in other places as well. The idea is to make it so people can have content and movies and things, when we get to high-definition television we can start to put that over the Internet as well. So I tried to argue through this whole talk that we can actually make available -- not only put the collective works of humankind on line, but we could then make it accessible again, and we could preserve it for the long term. It's a great time to be a library, and the question is, "Will we?" I don't know the answer to that, but I think the biggest barrier we have is that it's possible, and once you come to grips that it's possible then it's a question of just let's go off and do this thing. So universal access to all human knowledge is a wonderful goal I would suggest. I think it could be one of the greatest achievements of humankind up there with the mythology of the Library of Alexandria, the Gutenberg press, or landing a man on the moon. The idea of having every kid in rural Uganda being able to watch MIT lectures or read books that are in the Library of Congress or be able to go and right their own books and put them into the Library of Congress, then we'll really be getting there and doing our jobs. Thank you very much. [applause] Deanna Marcum: Great speech. [applause] As always, Brewster, you're an inspiration to us and make us want to get busy right away. There's so much to do. We're now going to hear from our two panelists. We're going to Abby Smith to speak first for a few minutes and then Robert Martin. You can stay, here you are Abby. Abby Smith: Well universal access to all human knowledge. I mean, what's not to like? And Brewster has a way of describing a problem that seems imminently solvable because he started to solve some of those issues. So I want to talk about the role that the libraries can play in fulfilling some of the dreams that we all share. In part by carving up the space of things that need to be done, and giving to commerce and Brewster the things that they can do, and talking about the responsibilities that libraries can uniquely fill. But first before going into that I want to ask if bookmobiles are so cheap and print on demand is so easy, why do we not have it? I mean why is it in fact that we don't have bookmobiles or even bookstores that we can order out-of-print books now? There are certain barriers, I don't know what they are, but there is a big gap between demonstrating that something is possible and actually finding a way to make it universal and a commercial practice or at least economically feasible. Well that said, that's my own private dream, that I can order any book I want and own it in a cheap copy and I'll pay $15 if it costs that, but the real question is what can libraries do in the two areas you were talking about. First, to rescue all the material which is in obsolete analog formats or if they're not currently obsolete they will soon be obsolete, so that in fact one or two generations from now the people who by default go online for everything and expect everything to be online, will actually have access to all of the knowledge that has been created heretofore. I think it's a wonderful distinction that Brewster makes between access to knowledge and access to information. I think libraries have always seen themselves and archives in a place where they collect information, and that information is transformed into knowledge through providing a space where people interact in a certain way with what has been recorded, and that's why people tend to think of libraries and archives as temples. It's not because they collect a lot of things but because a certain very profoundly human and almost mystical thing happens in libraries. I think one of the challenges for digital libraries is to recreate that space where people do work with exposure to information and turn it into knowledge. I'm really glad that you articulated that in that way, that we're not just trying to save a lot of information. I think that if we look at the issues that Brewster posed in terms of the analog legacy materials let's say, there is the technological wherewithal, although there isn't currently the bandwidth. There aren't business models to support a lot of things. I think there is an increasing understanding of the value of both material that's out of copyright being rescued and material which is in copyright having an economic value which somebody needs to get compensated for to keep creativity going. We might adjust how much of the money that it takes to keep it in circulation is a portion to which part of the lifecycle of information: More to authors? Less to publishers if it cost less? Swapping of tapes of course free only happens when there are artists who actually perform, and they don't make money through their performance, they make monies other ways so they can give away performances. That's a classic audio phenomenon or I should say musical phenomenon. But I think the area where libraries actually have a special call, shall we say, is in the orphaned materials, and they can start by disentangling what are the entanglements that copyright has actually created? Some of which I think we can elide, and I think that Brewster has shown that it can be elided very safely in certain cases by just accepting a minimal amount of risk, and I frankly think libraries ought to be willing to take more risk for the public good of making this material available. In addition to that I think libraries need to actually have an active rescue campaign by engaging in the public policy debate about whether information like this is valuable and findings solutions to work with orphaned materials to make them accessible. Let me say -- there's a lot more to be said, but I have five minutes -- let me just say something about the greatest challenge of all I think for libraries, and that is to wrap our heads around the challenge of the Web, when in fact there is a steady stream, a growing stream of information that is being pushed at us, and we know in fact that there's a lot of knowledge in there, but it doesn't arrive in the ways that we're used to, which is vetted by publishers or validated by actually impressing onto some medium that costs some money to circulate. In fact, both really important knowledge and just plain old information is equally accessible except through, say, licensed databases and libraries, and I think the role that libraries uniquely play -- and librarians -- is to think through how do we sift through what is valuable for preserving over the long-term and what we simply need to give access to now. So I think ultimately the most important role that libraries will play in the information age, the digital information age, is to make sure that much of what's distributed by default on the Web is preserved and that we actually develop systems, collection building, so that people can find what they want, as well as assume as they always have, that if they find it on a library Web site, that is a valid, authentic, and reliable information, so that it's distinct from just finding it on the Web, but that's quite a challenge Brewster. Thank you for stating it so clearly and optimistically. [unintelligible] Robert Martin: Thank you. I think I'd like to pick up where Abby left off in fact, but I'll go back to where she started, universal access to human knowledge, what's not to like? It's a noble dream, it's a wonderful ideal, and I think we all have a visceral positive reaction to that, but I think that the devil is in the details. I think we need to think through carefully exactly what the ramifications of that ideal might be. I bring to that of the perspective of 30 years as an archivist and a librarian, and I think it's important to begin by talking about how and whether it's desirable, useful, ultimately responsible to try to keep everything. The first part of Brewster's talk in which he was talking about enhancing access to collections that are in place in libraries throughout the world by making digital surrogates, and this book through the process to book again, is a splendid example of what can be done and should be done. To try to answer Abby's question, "Why aren't we doing it?", it's an issue I think primarily of awareness and readiness funding , ut there are instances of similar kinds of activities that have been done and funded by states and counties, and I think it's quite possible to demonstrate on a broad scale how this works and what benefit it could be. For example, I know in California they have I think now 15 mobile libraries for children that they call L-labs. There may be more than that by now, that are funded in part through the State library and part through County funds and special funding streams that take collections of books and other resources out into the communities for children, and it's an early learning enterprise. They take very young children and acquaint them with their first book. It's not a lending library they actually give these books to kids to take home, and the reaction is quite splendid, and it's along the lines of what Brewster's talking about. The next logical step would be to build on the resources that he's already creating to take the million digital volumes that are there and provide the bookmobile that can provide access to them, print them, bind them, and let the kids take them home. I think that's a splendid ideal, but when you look at the range of resources that are available now in the born digital world, primarily on the World Wide Web, I think you have to seriously question whether it's desirable or useful to try to capture all of it. We are certainly all familiar with the growing number of bloggers out there, many of whom have contributed substantially to our public discourse, but many others of whom seem never to have had a thought that went unexpressed, and I really do wonder how valuable it is in the global scheme of things to try to capture that for posterity. Most people think that archivists are dedicated to preserving things, but I'm here to tell you that what archivists spend most of their time doing is throwing things away. The important work that they do is deciding what of that overwhelming mass of material, what small percentage of that are we going to keep? It's typically 2% to 4% in public archives around the world. Part of that decision is driven in fact by resource limitation, that we don't have room to keep everything so we have to throw it away, but part of it is simply a process of what archivists refer to as "appraisal," deciding what's useful in the long run. What has enduring lasting value or utility to anybody and what was a record that was created simply to document a transaction that at some point in the future becomes irrelevant to have that document or transaction. So the important thing is what archivists call "appraisal" and what librarians call "selection" and that is applying human judgment to the resources that are available in deciding what to keep. Brewster has demonstrated quite capably that with the new technology the resource limitation issue is not critical, and it may in fact evolve to the point where it's completely irrelevant as the mechanism that is driving that decision making process, but I do wonder ultimately whether it would be useful for us to have universal access to all information. That's not what he's driving at, and Abby has clearly drawn the distinction, but what's important, I think, for all this in our daily lives as scholars, a citizens, as people participating in commerce, is a not having access to everything, it's having access to the right thing at the right time. What makes that happen is developing robust systems for organizing, creating meta-data, and providing retrieval mechanisms that help connect an individual to the knowledge that they need at the time that they need it. I think it's critically important as we move forward in this exciting possibility of creating a universal access to human knowledge that we spend at least as much time thinking about how we're going to make that connection of the individual to what they need as we do thinking about how we're going to capture all the stuff and hold onto it. Deanna Marcum: Thank you very much to both panelists. I know that Brewster would like to comment on a few things you've said, so I'm going to give him a chance to respond to the panelists. Derrick, if you have anything you'd like to say we can hear that, and then we'll ask our audience to chime in with questions. There's now a microphone in the middle aisle we can go to, and we have many questions from viewers at home. So well begin with Brewster responding. Brewster Kahle: You're absolutely right that the aspects of selection, organization, the cataloging, and meta-data are absolutely with us in this digital age, and we've been really wrestling with those with the World Wide Web. I guess this is where the traditional cataloging approach really started to become unglued. When there were just too many Web sites, and how do you deal with them? I now hear that it costs about $150 to catalog a book properly, but when you're dealing with 50 million Web sites do you'll want to catalog all of them? Is there a new way of thinking about it? Also selection turns out to be it big issue, and it's pretty hard to do, and we're finding if you stop to think, "well should we select it or not?" often it's gone by the time you say yes. So we've turned a little bit on their head somewhat successfully, where let's try to select after the fact. Why don't we grab a pile of it just so that we preserve it and then pick through it. Australia tried just archiving the valuable Web sites of Australia, up to 900 Web sites, and then they said, "Let's just do the whole thing." I think we're starting to arrive that yes, we've got to get ahead of it, and one of the key things in selection is what you absolutely mentioned which is that you don't stumble over stuff that you didn't want, and the technologists have been doing a pretty good job of helping us weed through this. I'm amazed by these current search engines. The idea of being able to use a couple key words to find what you want out of billions of things? I mean remember getting lost in my town library which only had 100,000 books, and we're starting to find our way through billions? The technologists have got something to offer I think after the fact selection. Another question you brought up is, "Why didn't print it on demand happen yet?" It's a really good question, and when I talk to the folks at HP, they say, "Well, we've gone through the study several times." Every time we get in these high-priced consultants, and they say, "Okay if we get Bertelsmann and Barnes and Noble and HP all to work together in this sort of big billion dollar project can we do it?" And they always come up with, "Well, no." Right? Because you can't get everybody before the fact to make it happen. I think where new ideas come from is the commons. It comes from below. It comes from the academics. It comes from the public domain. It comes from the in-betweens. Then it percolates up until it's a commercialable success. I'd say that we might see print-on-demand happen in Internet cafs in the Third World, where that may be the new library system in the Third World. I don't think I've ever been someplace so obscure that I wasn't a day's walk from an Internet caf. They're everywhere. They're entrepreneurial, they're connected and that might be the new library system. Then it sort of percolates up, and then we'll see it in Barnes and Noble. So I think it might be that we have things a little bit wrong in the terms -- thinking of it top-down, and often that's not where the innovations come from. The Internet came from below, where at the same time in the early '90s they were talking about interactive television. There was that, "We know what people want. We're going to sell them this and this and this through their television." Well no, they wanted the Internet. It was kind of chaotic and kind of interesting, and it bubbled up from below and then became a commercial success. So I still have hope that we'll see books all over the place. Deanna Marcum: We have so many questions that people want to ask. I'm going to move now, if it's acceptable to you Derrick? Derrick de Kerckhove: That's fine. I would like to have one minute to make one comment. It's the "Will we?" question, which in fact, you just sort of more or less answered. I find fascinating the pattern of development that has happened in information distribution since the invention of the printing press but particularly since electricity came in. The last thing that you talked about, which was the wireless distribution, actually makes it very less expensive and much more doable for it to happen, but this is only one more of the piece of that enormous puzzle, which is occurring so that the "Will we?" question is almost rhetorical. Yes, it's going to happen because in fact it is actually happening. That's the comment I'm saying. Brewster Kahle: That's the most optimistic thing I've heard. [laughter] Derrick de Kerckhove: I'm a natural-born optimist, and you're a realist and a great one too, so I make amends about it, but I actually feel that it really is happening, so that's the comment I wanted to make. Deanna Marcum: Alright. We'll turn now to the person standing at the microphone. Male Speaker: [inaudible]. I'm from rural Minnesota. We don't have access to [inaudible] and [inaudible]. How do we make that affordable? [inaudible] access is the question. Male Speaker: That remained out there. Male Speaker: What is your response? Brewster Kahle: Well at 288 modem, you're out of luck when it comes to putting video over the net, but there's some good news. If you're putting across these books, I've been passing around these books. I don't know if they've gotten all over the place. I'll put some more out in the back. These books, what this one is, The Cheerful Cricket that was scanned here at the Library of Congress. At least when you compress this using the current compression technologies you can get this down to around four or five megabytes. You could download this, okay it's not that fast, in about a half-an-hour to 45 minutes over even a modem line. So books at least given these incredibly smart guys doing compression, you should be able to get books even over phone lines, so that's some good news. But for the big files like movies get some upgrades, or get some wireless system and take the system into your own hands. Deanna Marcum. Thank you. This is a question from Houston, Texas: I am lover of university research libraries with plenty of books to browse through in stacks. It seems to me that digitization of books will mean the disappearance of the well-stocked university library as we know it and will mean a real loss. There is simply a different quality to research and learning with books available the old-fashioned way. Do you see a future for libraries in the coming decades or will they disappear entirely as we know them? Bob, you should answer that question I think. [laughter] Robert Martin: Well, I'm a firm believer in multiple paths, and I believe that for the foreseeable future the role of the growing digital environment is to provide a parallel path for access to resources, and we'll see the traditional codex book and the digital book and other digital resources co-exist side by side, at least for my lifetime. In part that's because the codex technology is a very proven reliable, good, solid technology. It does what it does extremely well. The digital technology does what it does extremely well, and they do somewhat different things. I don't think that the one replaces the other. What we've seen for example in museums is that as museums create digital surrogates of their authentic collections, it has dramatically increased the number of visitors who come to those museums to see the real and the authentic. I think to a certain extent we will continue to want access to the physical object as one of our satisfying means of access to resources, and that's in fact what Brewster's doing with his book to book process. Deanna Marcum: Back to the microphone. Male Speaker: Yes. Brewster, you were somewhat coy about the, "Will we?" question so I hope to hear your views on that. What can the Library of Congress do to be able to answer the "Will we?" question positively and what can Congress do to directly realize that as well? Brewster Kahle: Oh, great. [laughs] The Library of Congress is in a very special role, not only in this country but in the world, in that a lot of people look to the Library of Congress as a standard-setter. "What's the bar? What should the rest of us do?" So some of the great work that's already done here is really doing scanning technologies to try and set standards. But I think the Library of Congress could go quite a bit further towards starting to emphasize access. It used to be that if people came into the Library of Congress and started using their books, one it would be very expensive to deal with all of these people coming in more and more, also the books might suffer. But in this digital world we don't have those characteristics. You could actually serve books over and over and over again and not have the original get any more destroyed than the first time you just through and digitized it. So starting to move the Library of Congress more towards a library of access is sort of one of the great opportunities. The other is that it's got a gigantic budget, and it's got people all over the world. There are 70 people in Delhi collecting books. There are 30 in Cairo collecting books. The Library of Congress's collections are unbelievable. There's nothing like it in the world today, is what's in the Library of Congress, and having it so that all our kids can go and share and build on these, play with them, make new things out of them, whether they're Balinese palm-leaf books that are here, some of the only Cambodian books. I was talking to the librarian in Alexandria, Egypt, and I said, "Great let's go and digitize your books." He said, "Well Brewster, we don't have the best collection of Arabic books." I said, "Who does?" He says, "The Library of Congress." So let's go through and go with some of these groups and go through sets of national materials, get an okay from say the Egyptian government to digitize all the archival materials that are here and give them back to Egypt, the archival digitized copies, and then let's have a swap between Egypt and the United States. So I think we can start to do these sorts of creative swaps because the technology makes it such that you can make copies without diminishing your original. Deanna Marcum: Thank you. The next person at the microphone please. Male Speaker: What objections or concerns have you had regarding censorship, freedom of expression, objections to material whether it may be politically, religiously, scientifically, morally, or culturally correct or not? What role do you see truth playing in all of this? Brewster Kahle: So how do we deal with truth, and what should be collected, that sort of thing. Well, we're trying to figure out what we should do first and how should we prioritize our time. We've tried to figure out if we would know what was of high-quality and worthwhile to collect or not to collect, and we determined that we probably we're the right ones to judge. So we've gone and done what we've been technologically able to do, and that's been really the sort of approach. Until we start getting the users and the feedback to figure out what's valuable to people, and a lot of our collection specifically the Web, went for five years before it was even used by anybody, before we made the Wayback Machine. So it was only until then do we even start to get the answers back. So we just started trying to collect the whole darn thing. As Scott Adams the author of Restaurant at the End of the Universe and Hitchhiker's Guide to the Galaxy, he said something that has really rung true to me that I thought was fantastic. He said, "The amazing thing about the internet is it's just us. It's just us; it's not packaged by somebody else. It's just us out there? We wanted to really capture that. That a lot of the books that are in our libraries, if they went through a selection committee, they probably wouldn't make it. But you know there's probably a descendent of those authors out there. You know my grandfather wrote books, and he died 40 years ago. They've been out of print for 60 years. I have one copy, one copy. I have two sons. [laughter] Okay, they say it's illegal for me to make a copy for my other son. I think this is wrong. Speaking of what Congress might be able to do for us, but the idea that these libraries are full of basically our family histories, our personal histories is really great stuff. So I think we should err on the side of collecting the materials that were made available publicly. We try to stay away from things currently that were meant for really small publics. That were meant to be private when they were created but Web pages weren't. They were meant to be made available to everybody, and that what it is we currently collect. Deanna Marcum: I'm now going to read a question from a viewer: Considering the transitory nature of information on the Internet, that is the here-today- gone-tomorrow aspect of information on Web pages and whole Web sites, how can you hope to capture the volumes of wonderful personal research, information, and personal stories, which seem to blink on and off on the Internet radar? Brewster Kahle: How to save the personal stories that are out there? Probably it's the blogs that are the amazing thing. My sister used to keep her diary with a little key. Right? To keep me from being able to go and find out her secrets, but I think the equivalent of my sister, a young girl, now would be putting up a blog going and telling everybody sort of what would be - So it's amazing the sorts of personal histories that are going on and being put on the Web and even videotapes and things like that, video postings that are being made available, and we're making these and making them and saving them. What we're also finding is sometimes people after the fact don't want them up anymore, that it doesn't seem like it's a benefit to them. So if they go and find things on the Wayback Machine that they wrote or their old Web site, and they say, "Can you just make that go away," then we tell them how to use a robot exclusion on their Web site or if they don't know how to use that we basically put them on a special list, and it's taken out of the Wayback Machine and things aren't crawled again. So the idea is that people still have some control of what it is they've made available, but we find that most people react really with glee that their old things are up there and their personal stories are being preserved. Deanna Kahle: Thank you. Anyone else at the microphone? Female Speaker: I have a question about the different universal access as it relates to different countries. Would it relate to universal principles and training to make decisions what to keep, what to throw out, more and more standardized or would the individual uniqueness of different, as more countries join, would be kept? Deanna Marcum: You're asking about the international implications? Female Speaker: Right. For example, there's Egypt, Amsterdam, as more and more countries join in a universal system. Deanna Marcum: Do you want to take that Brewster? Brewster Kahle: I'll start anyway. I think every country and every culture will deal with their information in a deeply individual way, in the sense that the way people deal with their libraries and their education of their young reflects their cultures, and I think what we'll see in the world is no different. We may just be more aware of international approaches because the Internet has made it in some sense much more accessible to be able to see into how people are managing themselves. But I find it fascinating just how people have taken care of their heritage and how do they want to share it and pass it down. I think it's important in this digital age that we respect different approaches to these important issues and give people flexibility towards arriving at their own understanding of how this means. In this country we have a long tradition of public libraries and free public education, and lots of countries do, but that's not universal, and I think we'll start to see some of the benefits of these when we start to have public domain text books out there, and we can start sharing these more readily. We can start to find some kids in Kenya that might be going and watching professional mathematician lectures on the Net and starting to get an education that they might not have been able to have access to locally, that we'll start to see a blurring of some of these national borders, but the respect for international customs I think is key. Derrick de Kerckhove: Can I add something to this question? Because part of it that I understood was, "Is there a homogenizing affect of digitization?" Like between cultures. Is there something that actually is coming within the digital that actually could impose its own standards to all cultures and information distribution? I think that there is, but I think it's so deep that is actually much, much deeper than the alphabet affect or the how to represent all human experience into a print form, I think electricity. It was exciting for example to hear you not limit yourself to text but going to movies and to audio, that is one of the effects of digitization. That multiplies the possibility. But the zero-one becomes the smallest common denominator of everything, and that is homogenizing of sorts. Just as the alphabet was homogenizing principle as zero-one is, and it's interesting to see the story. It was 26/27 signs, depending on the language, to the Morse, which was three signs, to the digital, which is one sign. Morse is three. It's on, off, not, I mean its [unintelligible]. It's long, short, not. That's three, but on-off is only one sign. So the digitizing, the kind of homogenizing effect you have to seek for it. You have to seek, and then you have to know the consequences of that I don't know. Deanna Marcum: I'm going to switch topics now, and this is a question from someone in Lowell, Massachusetts: Is anyone copying or sampling A.M. talk radio? I mean popular talk radio often dealing with politics, probably considered by librarians not worth preserving and pretty near trash-talk. Yet it is a form of communication which could explain to future scholars something about us as Americans. What are we doing about talk radio? Brewster Kahle: I'm not aware of anybody that's capturing it. We're not doing it. We'd like to, but we're not. I don't know of any comprehensive collections. Deanna Marcum: Stay tuned, I think. Abby Smith: I think there's probably someone in the audience -- Brewster Kahle: Any volunteer in the audience to take on A.M. radio? [laughs] There's one, we have two. A couple. Deanna Marcum: Tim, are you next or -- oh, Sam? Sam: I'm responding to that that the Library of Congress has obtained copyright authorization to archive Web radio, or that is broadcast radio as broadcast through the Web, has only had this a matter of weeks, and we've set up a set of procedures, and our target was exactly, was talk radio, which we saw as sort of a fundamental venue for political discourse. This is through the American Television and Radio Archive Act, which was passed in '76 but only extended to radio about two weeks ago. Deanna Marcum: The wonderful thing about these events I always learn something new from my colleagues. It's great. Let me then turn to Tim. Tim Eastman: I'm Tim Eastman. I work at the stateside data center at NASA-Goddard, and we're beginning to have some real success with applying new data mining tools and knowledge discovery tools to the large scientific data sets, and it strikes me with this new emerging large reservoir and richness of digital materials that kids in Minnesota, in Afghanistan, wherever in the world potentially could get involved with applying such knowledge discovery tools and themselves have the opportunity to be at the forefront of new knowledge discovery. Any thought about that? Robert Martin: Actually I'd like to respond to that because the Institute of Museum and Library Services has actually funded a number of projects that are doing just that, engaging young science learners through Web-based tools to work with real scientists in the field in terms of biological field studies using astronomical observations that are currently underway from astronomers and a range of other kinds of, with botanical gardens working with the field scientists to do real work in the field so there's a lot of potential for doing that. Abby Smith: I was just going to say that in fact the real challenge is after we've deepened our ability to query natural data is actually to do that with the data produced by human cultures, which is infinitely more complex and ambiguous, and I think that's the real scientific challenge for the next 20 years is to turn to data-mining of human cultures, the sorts of recorded information that libraries and archives collect, which is streaming to us through the Web. Brewster Kahle: A couple of the examples that have been used, a couple of researchers have done on top of our collections, which I find kind of inspiring towards applying computation to large textual collections. One is by Philip Resnik at the University of Maryland, where he's gone and found parallel texts in two languages, Hungarian and English, and found large amounts of it and used it to fine-tune his automatic translation technology. So going beyond dictionary substitution of this word means this in Hungarian, going and learning phraseology, and having large enough corpuses is important to do research like that. I found that to be fascinating. Another is some work that's done by Remy Stadadon [spelled phonetically], in the UC system and also Cornell, where they've taken full snapshots of the Web moving tens of terabytes of materials and analyzing just the link structure. They throw out all of the text of the World Wide Web and just look at the link structure to try and figure out what's going on here? Who has authority in the Web? Who's referred to? Who's cited a lot? Are they important in other senses, or are they Web phenomenon's, and they've found that there is a core set of materials that lots and lots of people point to, that there's a core of the Internet, and this is the sort of thing that if you just go and throw away, and analyze huge collections of material you can start to find patterns about who we are as people and how does our society work by analyzing these things in the large. Deanna Marcum: Sir? Male Speaker: You've talked about Moore's Law and how the steady increase in storage capacity and computing technology and computational power has led, though it's been steady, you had at some point a discontinuous jump where you said, "This is possible, we can do this," and enabled things like the Internet Archive and bookmobiles and other realms -- things like the iPod or the TiVo became possible just -- and they're discontinuous in what was otherwise a steady stream of increase. I wonder if you could say what you had thought about in the next three or five years that might be another discontinuity that something that we can do only when we reach the next plateau in storage or computing or something like that. Brewster Kahle: Oh gosh. Deanna Marcum: Do you want to be the futurist? Brewster Kahle: Well, Moore's Law has been just so steady. In fact, when I was in a technical college in the late '70s we said, "Okay, we'll have all literature, we'll have the encyclopedia then, all books then, all movies." We just sort of charted it out, and it's in fact has come true. The thing that we didn't really predict at that time was the Internet, that that type of communication structure was going to be really what was available. But your question is assuming continuous growth of bandwidth and storage and compute power, what does it afford next? Right? Now we're starting to get access to video, the published works of video, but what happens when you can record everything? You can just have lots of cameras in different places and start having video around you a lot, and that's one thing I'm very excited about is when screens start getting better. Screens are still pretty awful. You look at a modern office, and there's a lot less workspace now than there was 20 years ago. Twenty years ago we had desks this big, and we had stuff, but now everything is focused on this 17-inch, maybe 19, and if you're really important, 21-inch screen. Right? That's your desktop? Come on. Right? What happens when we have these sorts of screens all over, and we can start to have ambient video? I'd actually like to be connected, ambient, with a large number of people that I know, probably with the audio off. [laughter] Thank you. But just have sort of on my wall, my wallpaper be things I'd like to sort of keep up to date with. Not instant messaging, these little things, but, sort of people -- and so if their sort of sitting around I might go, "ding dong, you want to talk now?" and have ambient video. I think it would keep me connected with my family a lot better, my friends in a distributive environment. I find that I'm doing a lot more work with people that are all over the world, and I'm finding it fascinating. I'd just like to stay better connected, and that's something that I'm looking forward to. Deanna Marcum: And a final question from one of our viewers. "I would like to ask what timeframe and how many tax dollars it will take to achieve the digital future as you have described it?" Robert Martin: Have to get Congress involved. [laughter] Brewster Kahle: Well I'm 44, and I plan to do this by the time I die. Right? Our life is this long; you are here. Right? And some things that are really important to you, you have to really move along. I'd say this could be the man on the moon by the end of the decade kind of opportunity. I think if we took the published works of humankind we could knock it off in this decade. I don't think it will happen. I think it will take more like 15 years before, frankly, we've sort of given upon on some of the books that didn't get digitized, and we're sort of making do with what we've already gotten, and we'll start to slow down. So in 15 years I'd say it's completely within our grasp. Remember this doesn't cost that much. The books, $280 million dollars is my estimate, movies cost about $15 an hour to digitize a videotape. That's not very much. Movies, if you were to do both digitization its $100 to $150 an hour to digitize movies. I think it's like the human genome project where there is sort of this long-term project that could have been this government funded thing, it was going to be five billion dollars, NIH, and then a guy said, "No. I can do this in four years for $300 million." Everybody said, "No, you couldn't -- you're never going to -- he can't do it" He said, "Watch me." And he did. Craig Venter went and ripped the human genome. He basically just took it, and did pretty normal stuff by just stacking a bunch of machines up and going through and doing it. The thing I really like about the human genome project is he went, and gave it away. He took the human genome and he said instead of patenting it or copyrighting it or trade-marking it or something, he said, "No, we're going to make that a record of what humans are, and we're going to make it so anybody can download the whole thing." We may sell advanced tools for making patterns out of it but stuff like that, but the public domain, our DNA, belongs to us. What I'd like to see is our public domain that's in our libraries and archives effectively made digital and put into the public domain so that everybody can have access to it. It's within our grasp. Deanna Marcum: Well with that we end the program tonight, and I can't think of a better way to end, and I think, speaking for everyone in the audience and all of my library colleagues, we hope you live a very long life Brewster. [laughter] We need you. And to the panelists sincere thanks for giving time and thought to this important topic. Thank you all for coming. Good night. [applause]