>> From the Library of Congress in Washington, DC. [ Silence ] >> Good morning. I'm Beacher Wiggins. I am the Director for Acquisitions and Bibliographic Access here at the Library of Congress. We are delighted to have a followup session to the one that we hosted at ALA Annual in Anaheim a few weeks ago. It's a part of our process of making sure that LC staff were apprised and kept up-to-date with major initiatives that we're working on. This then focuses on the bibliographic framework initiatives. Sometime we put the adjective "new" in front of it, some variations but it's all our effort to think about moving into an environment beyond the current MARC environment. This session will be filmed because we want not only to make it accessible to LC staff, but we also promised that we would make it accessible to our colleagues in the national and indeed the international community. So, stay tuned. Let your friends and colleagues know that they'll be able to follow what we hear today. And as part of our continuing effort to make sure what were' doing as a transparent as we can make it. Sally McCallum and I will set the stage today. But the major speaker will be Eric Miller from Zepheira. And Sally will introduce Eric and talk about the project activities to date following me. I want to begin as I did in Anaheim and read from a quote from Roberta Shaffer the newly appointed Associate Librarian for Library Services. I think we can still say new. It's still been less than 6 months I think since she was appointed in January. On April 16th, as part of her, "On My Mind" in Minding Matters, the news-- the bi-weekly communication vehicle that she issues for Library Services staff indeed beyond, she stated, "I had little doubt when I became Associate Librarian just over a hundred days ago that the New Bibliographic Framework Initiative, NBFI, as she referred to it, then just barely under way would be a critical aspect of my new role". The new framework will replace MARC, the brilliant bibliographic description system that the library developed and the world embraced and indeed use for nearly 50 years. How many things can-- in today's fast moving world can last for nearly 50 years? The time has come now though to replace MARC. LC will not and does not want to go it alone. And in the next several months library staff will be asked to give their input to the New Bibliographic Framework Initiative and outside users and partners will be queried as well. Now, today's session is a part of and the previous sessions that we did at ALA Annual and Midwinter meetings are part of that promise of outreach as we move forward. We will also have time following the major presentation today by Eric for questions and answers. We also have planned a session this afternoon with Eric where attendees both you who are here today and others who couldn't make this morning session can have some more time with him to talk-- we'll have more in-depth dialogue on matters related to this initiative and to Linked Data and the semantic world technologies and related standards. Now, Sally will introduce Eric and set the stage for project activities that have led to Eric's being a part of this process. Sally. [ Pause ] >> Okay, now I can see. [Laughter] Okay, I am-- I can see almost. There are bright lights shining this way. I want to talk about two things briefly, because the key is to go on to Eric. One has to do with the contract with Zepheira. And the other one has to do with community information-- involvement. The New Bibliographic Framework Initiative we stated from the beginning that we wanted it to be oriented to the technologies, the Semantic Web and Linked Open Data or the Linked Data environment. There are many institutions that we can think of as traditional institutions, because they're in fact very old libraries who have been exploring that area such as OCLC, British Library, the German National Library, Harvard and the Library of Congress to name a few. There are many others also that are pushing that-- pushing forward with Linked Open Data. So we decided to do this with contractor support. And what we were looking for was that combination of mature understanding of the W3C, the World Wide Web Consortium-driven technology. And with also, which is where the Semantic Web and the Linked Open Data standards and philosophy has originated. And also, within understanding-- someone with understanding or a group with an understanding and appreciation of the assets of the library community and its standards. And that's not an easy combination to find. We wanted this group to be able to stimulate ideas by-- and foster discussion and focus to the discussion by developing some models. So, Zepheira staff had the experienced with development and implementation of Linked Open Data that was exactly what we're looking for. They have library backgrounds, in fact, I first met Eric Miller, when he worked for OCLC back in the Dublin Core days in 1990s. They also have Semantic Web knowledge, in-depth. Eric Miller went from OCLC to the W3C at the time that they were first formulating the ideas and concepts of Linked Open Data. That time they call-- they generally called it Semantic Web. Linked Open Data became the words that were used for the same thing later. And he worked there with the theoretical aspects of it. But he also worked on in a practical manner at MIT with the assembly project which is one of the first fairly large projects that tried to implement Linked Open Data or Semantic Web concepts just as they were involving. So he had early practical work with it. And I won't say anymore about him and his team, because we will say something about that. With respect to community involvement Beacher has alluded to it. We have in fact a website and a list serve which I hope all of you will check out if you haven't already. Our initial idea was to have committees, two committees, three committees, several committees. And we ask for nominations to those committees. And we have a flood of nominations, most-- nearly all of them, very well qualified and a great deal of variety. But frankly, how are we going to select them and how are going to really get the input we needed? Because we needed people who in fact would commit to implementing. So, we're going to do something hands-on with the models. So, rather than theoretically talking about the models seemed like the appropriate one. So, we needed a hands-on strategy and innovators. And after Zepheira could give us initial model or models and some tools, we wanted them to experiment with real data and share notes with the wider community, try again, augment the tools, adjust the models, and develop not just the initial experimenters but then a wider circle and wider circle of implementers-- of experimenters, I should say, who were doing some implementation work. And this is not the similar from the way market stuff was developed. It was the-- It was not developed by committee. It was developed by a group of implementers who experimented with the models that were coming out of the Library of Congress at the time and reporting back what they found and how things worked. There may be eventually some conventional committees but I think that's way on down the line. So now I'm going to turn things over to Eric and that is the best part of our session and-- and I'll let him introduce his work. [ Applause ] >> Good morning everyone. I think everyone is on the time zone. So I'll just stick with morning as opposed to afternoon, evening, middle of the night. It's a pleasure to be here. Thank you very much for the introductions. I wanted to talk a little bit about some of the work we talk about couple weeks ago at ALA and give folks sort of an update on the-- an overview of where we're at in terms of the initial modeling, but also sort of process status and sort of next steps and frankly what we can do collectively now to start preparing for moving forward. And I was trying to basically figure out how best to discuss this because there's been a lot of different things as Sally mentioned had been going on around the world. Certainly there's been a lot of amazing interest and very large scale efforts in terms of applying Linked Data technologies to collections. But having worked in the library community now for, gosh, more than 25 years, there are so many other collections that haven't been surface and haven't been connected and really trying to basically provide a framework where the simple things are simple, but the complex things are possible in terms of connecting some of these assets together has been really the forefront of my mind for quite sometime. And when I met Sally, I think it was in March, and we discussed some of this work and I came to talk about Linked Data in the future of sort of what the web as platform can do for social curation and sharing, that was a wonderful experience . I brought my family with me and Judith was kind enough to sort of give us a tour and if you're ever met my family that's not necessarily an easy thing. But she took us to meet Sybille. And Sybille is the head of the Rare Childrens Collection and it was a wonderful experience for my children. I don't know if it as was as much for Sybille, but it was a wonderful experience for my children because they saw things they talk about ever since that the world smallest book which is really small. And just as huge amazing collection of pop up books that were this whole new form of genre in terms of communicating stories. This is an example of something that I had never even heard of called a "Peepshow", which was a very fascinating three-dimensional book in which it expands as an accordion. And if you look through very small points in the book your brain registers page after page after page as they open up and they sort of tell a story and it was just the most fascinating experience. Not just seeing new forms of material that I hadn't seen before, but seeing them through the eyes of my children, my children literally for weeks on end coming after this trip basically-- of all the things that they saw. My 5-year-old, basically, you know, went to school and said, "I went to Washington and I saw a Peepshow." And it was-- I was trying to explain this context because he was excited about it. But it was-- it made such an imprint and it's been one of the things that's been really part of the driving force in terms of much of the folks that I brought together on the Zepheira side to try and work in this. It's-- There are so many amazing assets the library community has responsibility for curating and managing, sharing those into a wider [inaudible], so they can share it with others. It's one of the fundamental things that brings us together as a community and that was-- you know, in this very small instance an example of this and what we're trying to do is really trying to accelerate that in and do it through infrastructure that the community in general, not necessarily the library community, but the world in general has been spending time and energy putting in place. So I want to talk briefly a little bit about-- just about briefly about us and sort of how we are approaching this. Chances are you may not have heard about our organization, but you have with the lot of the folks that we work on. We really help a lot of different organizations around the world leverage the web as a platform for sharing data to connect, curate, analyze and augment data assets that are traditionally locked up inside of silos down to the people that need it, whether user patrons in terms of the library, museum community or scientist in terms of the public health space, how really to more effectively weave together social communities around data and add value throughout the process. Many of the folks that are part of Zepheira have been very fortunate to bring together, many of those that have been part of the web since it's-- you know, since the beginning. We wrote the original browsers together, the protocols, a lot of the three and four letter standards that are extremely difficult to pronounce and poorly named, we've had a hand in. But we've really basically collectively not only been involved in original part of the web, but have been very much involved in sort of setting the direction of applying the web for platform for managing data. Sally, mentioned, I left OCLC to go MIT to head up and start at that time what's now known as the Semantic Web Initiative that has laid the standards for sharing Linked Data not just in the library community, but for communities that have a common need for reducing the social and technical cost for sharing, curating, and managing data. And it's been going back to industry now applying those technologies in very large international organizations, commercial organizations, nonprofit organizations, finding out what works and what does at a very global practical scale has been a lot of what we've been focusing on at the moment. So when I thought about sort of bringing together a team to attack this problem, certainly some of the backgrounds in terms of standards and technologies are important part of this. Critically, subject matter expertise, you know, folks that have been on the ground not only creating MARC records but then helping others basically transition from card cataloging to electronic systems, and everywhere in between in terms of what's been successful from deployment, but also hardcore enterprise data integrations and solutions, architects. Large scale businesses have very large scale data aggregation problems, businesses, libraries share very similar problems. Getting different perspectives on these was a critical part of this. And also bringing together some folks that think from the get-go on processes to help communities move from one technology to another, the change management, the infrastructure, the process that needs to take place. So all of these decisions, in addition to all of the different projects we're trying to synthesize here, we're trying to basically come at this from various different perspectives with the hope that through these perspectives we might be able to come up with a solution that not only can scale and provide the kinds of goals in architectures that we're talking about that can be deployed in as affective a manner and as an efficient manner as possible. So a little bit about the work, our goal in doing this and talking with Sally was, you know, first sort of taking inventory in assessing from an architecture and modeling standpoint, a lot of the work that's been going on. There's a lot of work that's been going on in various different national libraries. There's a lot of work that's been going on in this building and other institutions that are dealing with library museum archival information. Trying to get a handle on all of these has been a bit of daunting challenge mostly because, you know, the different variability of the data, the different variability the needs, requires different solutions, but trying to get a sense of what's common among these and how can we put in place a substrate that takes the 45, 50 years of the richness in which we have in MARC. And I do mean richness both in the sort of technical but social, you know, evolutionary perspectives. You know, I've been part of the standards community for a very a long time, 25 years. That's a drop in the bucket compared to MARC, right. You know, how many standards can you think of that it lasted 45 years? I can't think of any interchange formats that had lasted longer than MARC. That is amazing. And something we should understand not-- you know, how. So part of what we've been doing is looking into just sort of, you know, the resilience of this. And what are those resilient patterns because we need to sort of reflect those and emulate those in how we move forward, the social, organizational, evolutionary aspects of this. So-- But looking at this from a standpoint of building on that evolution, I think it would be a mistake to basically just it was a mistake for any kind of web technology to sort to say, "Look, we're going to shut things down at this point, we're going to clean things up, and we're going to release a new version 2, 3, 4 years from now." No one has the luxury of doing this. So how can we lay down a model for supporting these dialogs for supporting this kind of used cases and goals in which we've identified but in the way that allows us to evolve rather than sort of a hardcore cut into something new. And really as Sally mentioned, something for the-- a basis for community discussion, what's going to come out of this work is not a fait accompli. It's not going to be the model. It's going to be a basis that we hope will server as a focal point for relating a wide range of different initiatives, and hopefully something that can be a very small change to many of these different initiatives that can see the combined effect when we start connecting the dots across them. And that's really been the focus of this from terms of review in modeling. One of the things that we've learned early on in the W3C Process is-- you know it's all well and good to basically define models and architectures and systems, but that's not as effective as also providing open source, reference code that can be implemented in multiple different languages and made available for different-- from different organizations. So having some substrate tools and infrastructure that not only basically give us an assessment from modeling perspective or what can scale, what can be effective, but also can serve as input into the basis for more refined standards, more baseline code that others can contribute to, all of these is sort of part of the process. And ultimately to create a roadmap for moving forward, both in terms of refining the model, but also a process in which the dialogue can take place, feedback can take place and the tools and infrastructure and the alternative approach can be vetted. Quickly on timelines, we started this in mid-May, and we've been in the process of doing a very deep dive in terms of related initiatives and we're finalizing the initial draft of the model for internal discussions. July, August, we've-- we're actually ahead of this in terms of the prototype and translation services, but tools to basically support this initial model, modeling for us and tool development work in parallel. It's easy to basically define very complex models that can't be implemented. It's much harder to define simple models that can't. So, how we basically work together and this is critical part, building some Linked Data interface tools, leveraging a lot of what the community in general has been building around Linked Data interfaces and environments, applying them in a context of traditional library data, really to start to show the value and the value proposition of what these technologies and social practices can enable, and throughout July and August, have a rapid refinement of these discussions and a rapid refinement of what works and what doesn't. Again, this is still the initial draft. But we take a very rapid approach in terms of that kind of refinement. August, September, a final draft, if you will, basically it's something for the community as a whole to begin to discuss, and a project roadmap in terms of next steps in moving forward. How many-- well, there were lot of people, perhaps in this room, they were also at ALA. I was amazed at ALA, the amount of discussion that was occurring around Linked Data. I was absolutely amazed. It's been several years since I been back to the library of community and it was a wonderful, wonderful experience returning and seeing how much enthusiasm and interest was in Linked Data. Up in the right hand corner, this is Tim Berners Lee original proposal when he was working at CERN for the web and it outlines an information management, architecture that talks about coming up with common protocols and connecting together different things using a common markup language et cetera, et cetera, et cetera. Up in the right hand corner, and I don't expect everyone to be able to read this, is a very interesting bit of marginalia. And if you zoom in on that little interesting magnifying glass, it reads "Vague but exciting." I think the interest at ALA around Linked Data was vague, but exciting. There are-- And I'm quite excited to see that excitement, but the vagueness is dangerous. And part of what we need to focus on and part of what this engagement is focused on is making that not vague at all, being very clear about the benefits in which we're leveraging these technologies, being very clear the impact, being very clear the capability of connecting this data or allowing this data to be connected in larger ecosystem, a larger framework, if you will. And that kind of context is an important part of this to tease out. So, you know, this is the definition, you know via Wikepedia of Linked Data, but I've introduced this sort of Tag Cloud aspect of this to really focus on the critical aspects of this. And very-- The first parts of this focus is really around the social. The technology aspects of this are interesting, but don't hold the candle to the social potentials and the community potentials that Linked Data is enabling. The ability of sharing and connecting data at a global scale and reducing the social cause for doing that is been remarkable to watch, we are using the web as an architecture which means, ceratin architectural things decentralized, creation, curation, this is all part of what the web enables and using global identifiers as a way of referencing the things that we're talking about. So-- But the social aspects of this are very important and with a little bit of engineering, the social things create very fascinating results. We think of this sometimes really as the power of recombinant data and in my house, the closest things to recombinant data are the things that I step on a daily basis which are all over house, and that is Legos. Now, the Legos from an engineering stand point are actually quite remarkable feat. The engineering tolerances of Legos are similar to of an internal combustion engine, and that's what makes the very satisfying, click, when you snap two Legos together. Now, Legos come at difference price points, trust me, I know this. They have different-- their packaged for different audiences. They-- But when they get to my house, they got scattered all over the floor. And when they get picked up, they get placed in two buckets. One bucket is named good and one is named evil. I don't really understand that. But the point is that they get reassembled then constantly and my children create new and interesting things from these. So they're packaged together for particular price points audience, consumer's needs, but they have the ability of being deconstructed, reconstructed, reorganized, and new things created in ways in which we could've imagined, and that's the power of a little bit of technology and communities, you know start small but it basically builds up very fast. And I don't know what it means when my children take the Jacques Cousteau from Lego and stick it on the horse from the Knights of the Round Table and start prancing it around. But to them it's a sea horse. And that concept makes perfect sense to them, and it does others. And that ability to reassemble and create new values simply though that assembly of these wonderful assets is what the little bit of technology and the whole bunch of society enable, and that's what really the web has provided. And that's what we are seeing more and more at a very high level in terms of-- a lot of different national libraries that are moving forward in this pace, but a lot of different other communities that are leveraging Linked Data as a way of engaging with-- they don't call them patrons like we do-- customers, researchers, engineers, scientists, the users of this data. But the increased flexibility for describing these resources, the more reusability the reuse of these aggregate resources in the context of other descriptive practices, new option-- new ways in which others can connect into those resources, and thus adding value to it is in which they can integrate this into more flexible and user-targeted applications. So, we didn't think-- we might not have thought that, you know, mobile technology would have been as pervasive as we could have imagined when we started the Semantic Web Initiative. But the fact the it has used the web as a platform and has built devices and applications based on that, we can quickly leverage a lot of the existing infrastructure here in the new target markets that we could have never imagined ten years ago when we started this. That kind of flexibility and adaptability in terms of these open standards is critical to us as a library community delivering solutions to patrons, but at the web in term of infrastructure in which we can more effectively share that data. Lighter weight models for extending future needs and uses, again, we don't know all the different kinds of things that we might be describing, we know now what we're describing in terms of MARC. But the kinds of new assets or new resources that we might be describing in the future, flexible ways in which this can be easily evolved, and improving positions, incredible records in the context of libraries to other kinds of applications. I'm going to talk a little bit about this here in a bit. But when more and more and more data is made available the-- who you trust and the quality of information is going to be an asset, it already is. Libraries are well-positioned to be a trusted view into that data. And part of what the benefits of this particular approach are is to accelerate that trust into a wide range of different constituencies and different communities that we might not have been able to touch otherwise. So let me be a little bit less vague and perhaps a little bit more concrete of what we're really talking about here. With all due respect to a lot of my colleges that have been working on the standards and technologies of the web for so long, I am reducing it to one slide. The web as a platform can be thought of as very simple protocols that allow resources to be linked to each other and retrieved. The little blue links that we click on and go from one document to another are uniquely identifiable, documents that have a link relationship that exist between them. We read, if we're fortunate enough to read the native language in which document is reflected in or are not visually impaired, we read the relationships that exist between these documents. This is my home page. Boom! But to an application, to a robot or harvester, or search engine index, it's literally just points at. So the web as much of a communication platform has become is really an architecture that supports uniquely identifiable things that link to other identifiable things. And allows us as a community to create these new things and link to them as we need. Part of what we're focusing on in terms of the Linked Data is to be very clear about the relationships, to allow relationships to be defined between these things. So no longer I'm talking about, you know, this is my friend, boom, but friendship as a relationship between two people. Or software dependencies not just linking to libraries, but having a formal dependency from one software to another piece of software. Or formal relationship between one person and where that person works. So, not only are we moving from just a contextual relationships of linked to something, to more tight relationships of how things are linking, but we're able to basically start talking about the types of things that they are. They're not an-- you know, an abstract thing that points to other things but we are now able to talk about people, places, subjects, books, monographs, music, et cetera, et cetera. We're able to basically define the type of thing that we're talking about and the contextual relationship that exist between that. And leverage the power of the web as way in which we can create new types and new relationships to connect things together. That's it, for all of the different specifications that many have been involved in, that-- those two slides reflect in essence the evolution of the web to-- you know, to a Linked Data architecture. And there are lots of different nuances and those nuances reflect those specifications. But being able to create new types of information and link to information in very specific ways is really about what the Linked Data architecture allows us to do, leveraging the web as a platform. And there had been a range of different communities that have been experimenting with this. This is just a fraction of the community that have been experimenting this in the library community and the library community is just a fraction of the communities that have been experimenting in this, in terms of delivering new products and services and many of them are not experimenting but making very large scale production level efforts, you know, to provide new ways in which they can sell products or services and different things like that. I'll talk a little bit about how these shapes in a bit benefit. But that last point here in schema.org is an example of that work that's going on. For those that are not familiar with this, schema.org is a new initiative between Google, Yahoo!, and Microsoft in terms of providing search engine ready solutions to Linked Data that's being published on the web. So they're providing vocabularies in which one can describe products and services that are bought and sold on the web. If you basically structure your data this way, it shows up higher in the search rankings. It's more easily discovered by your users and there are many, many, many companies that are seeing increases in terms of 17 to 19 percent increases by using some of these technologies right away. That shouldn't be driving us necessarily but it's important to understand that there are lot of other communities that are working in this in which we are now part of a larger data ecosystem and being aware of the impact of this and how we might connect is an important part of this. So in our community, I don't need to explain these afternoons which is lovely. But, you know, there is just a few here and of all of these everyone has done it differently. So the good news is they're doing it, the bad news is they're doing it in ways that don't effectively connect together. So we're not really leveraging yet the true effectiveness that had made our community successful in terms of cooperative cataloguing and sharing and connecting new assets but we made a very important step closer than we were before by surfacing this database with some common principles. Everyone of these now are starting to use identifiers, for example, for describing the kinds of things that they're talking about. Everyone of them are typing, being very clear of the kinds of things that they're talking about. Everyone is coming up with relationships that it's expressed how data is contextualizing connecting together. So even though we're all doing it differently we're starting to use the same substrate in which we can express our differences. And that's actually a huge step. And part of what we're hoping to achieve in this goal of this initial phase is really sort of a way in which we can start to connect all the-- that-- those different assets that are expressed in the same way together. Part of what we are doing to achieve this is really-- you know, we're asked to do this in terms of the review but asked us also to do this in the context of MARC and that has been an amazing experience. I-- There's so many fascinating-- I have to put myself on a diet in terms of trying to understand the history of the standard and how it evolved and everything else like that. The run-length encoding aspects alone are quite fascinating for anyone that's dealing in IBM and FSTIC still. I mean, but it's quite interesting to see the history and evolution and richness that's expressed in every-- even a single record. But deconstructing this, identifying from a Linked Data perspective the resources that are hidden or reflected inside of MARC to extract these and surface this in terms of web resources and to tease out common patterns that we might see in terms of MARC for describing in a quantitative way the combination of patters used for specific types of things and start to surface those relationships in ways in which we can connect these things together. MARC has a very rich history in terms of providing the infrastructure for talking about in the context of the cataloguing item, the people, the places, the organizations, the subjects in which are associated with that. Teasing these out into identifiable Legos that can be snapped together to create different shapes for a particular item is part of the process in which we're doing it, because it allows these building blocks to then be reassembled and reused, not only in terms of future cataloguing practices, but as a basis in substrates for a lot of these different initiatives that are currently experimenting in Linked Data in the space. So, MARC has effectively pioneered a lot of the Linked Data principles that have resurfaced again now in the web. Controlled vocabularies, I can't tell you how many times I go into different organizations and I talk about the value of controlled vocabularies. They're like, "Hmm, tell us more about that." [Laughter] You know, link authority files, personal names, we all know how difficult that is. Personal names in a global context is a very challenging task. Controlled places, you know, place names is a very difficult task, subjects. But this community has been taking those problems on head-on since the beginning of folks realizing there was a problem. And that kind of name authority files, those kinds of services, now exposing in terms of this web provides a very powerful way for us to reuse this data. So, from the standpoint of moving from MARC to this Linked Data space, I have to say, it hasn't been-- it really is an evolution, it's not a revolution here. We're simply now providing-- instead of, you know, sort of a single object for cataloguing, basically relationships to a smaller set of aggregate catalogues that could be combined together. And these catalogues are in essence around these smaller MARC resources, these core assets, these markers of people, of places, of concepts, of organizations. And we can begin to tease these out and start surfacing them in a way that allows us to connect things together in more effective ways. So, I'm zooming over briefly some of-- some Linked Data interfaces over traditional MARC data that has been representative in this initial marker model of Linked Data. And each of the names with the little person's head there to the left indicate-- this in fact a person, not an organization. And the stars are grafted directly into id.loc.gov and the checks are graft directly into VIAF. And I apologize. I am not a very good UI designer. But the point here is we can now start to basically show, not only the particular assets but how they start to connect to a wider range of link authority files. Now, that turns out to be partially useful both in terms of vetting the model. But also, frankly, more as a way of providing a control point where new communities can start grafting into. So not only can we start to tease out because of the richness of MARC, whether we're talking about a personal name or an organization in the subject aspect, whether we're talking about a place or a topic or an organization or a person, we can start teasing this out as different control points so that at any particular point, other communities can start to graft into and create a web of this underlying data. And we can start looking at this data now in different ways. We can look at this through a traditional sort of work view, titles, but in our old card catalogues, you know, organizing by titles, by subjects, by authors, we can now pivot on anyone of these different resources. We have-- Just in one simple way offered, in essence, the same functionality that we've offered again back at the original card cataloguing now for the web by giving the patron different ways in which they can pivot around. It's not just search anymore. We can now pivot around any of these particular markers that make sense, want to find anything that this particular person wrote, or painted, or sculpted, or reviewed, who he lived with, who influenced him, all of these are now possible because we're simply surfacing the inherent data that's inside of these records in new ways in which it can be addressed. And in this particular case, by connecting into other infrastructure, that is spending time and energy and resources to making the data about these people more and more and more interesting and available. So now, instead of the sort of cooperative cataloguing of the cataloguing record, we're reducing it to a much smaller level of granular cataloguing record where we're down to the individual levels and basing off, you know, the benefits of how different people are curating people or different people are curating subjects or different people are curating places or different people are curating, you know, any of these particular control points. And we can start to tease out some very simple patterns that reflect some of these underlying models and what we've seen today. So, we can start to tease out in a very light FRBR-esque way, some common patterns where we can see relationships between the conceptual work of something and various instances that are associated with that work. Associated with that work are control points, you know, pointers to subjects which might be people, places, organizations, concepts, how does the work was created, might be a person or organization. When we start talking about instances, we can graph those into particular places of publication, particular, you know, people or organizations, formats, all of these become pivot points in which our assets might be able to basically be expanded on. This is isn't-- this shouldn't be new. It's actually not new. It's simply using the web and a lot of the work that's gone into these standards from object-oriented programming to enter-- to entity relational diagrams to-- and the list goes on, but leveraging the standards that have emerged by bringing together these different communities back to ours. But by doing this as well, we can provide the hooks in which other organizations can begin to extend this and add value to it. So my library, I might say I have a-- I have this particular item, and we talked about holdings a lot in this, but now, I could easily basically reflect holdings as I can reflect access or other kinds of ways in which, I, as a library might want to extend the assets in which are available to me. A few years ago, we did a project with McArthur and OCLC on credibility on the web. And the first project they asked was to think about sort of machine algorithms in terms of credibility. And having done this with other search engines and groups like that, you know, we sort of said, "Well, gosh, wouldn't it be more interesting if we just take people, you know, credible communities like libraries and see what they could do as an overlay to this." So we spent a lot of time modeling, if you will, reference systems and collaborations, question point is an example of that from here. Teasing all sort of the questions people are asking and the answers in which credible librarians and reference librarians are sort of offering. Now, we have the ability-- the thing that plagued that particular project was being able to sort of say, "This work, this thing was the answer to this question." Libraries are still doing that over and over and over. But by giving a hook to a question and answer, we now have the ability of sort of overlaying these kinds of previous questions that folks have asked over the search engines that are basically made available. And we showed this but the thing that took us-- the thing that stopped us was not being able to just have a definitive identifier for the library asset in which the librarian was talking about. And this gives us that kind of ability now of starting to connect in different ways, not just other things that the community is doing, but other things that are happening in our community and annotating the bibliographic resources that we think about in MARC, now in new kinds of services, in this particular case, as a way of overlaying credibility on a search result. And that kind of fundamental hook, is all that's really needed here to start providing new Legos being snapped together in new and interesting ways. So, where are we? I talked a little bit about the review process, the supporting code. Again, one of the things that we, we spent a lot of time on terms on W3 wasn't just the standards and used cases but reference code that tested all of the issues that went into a particular standard. When the web started, when the web first began, many of us didn't learned about the web by looking at the specifications. I helped to write many of them and you would've been mistaken to learn the web by reading those specifications. It was sort of view source and it was using the tools, and using the code, and using the applications that helped get folks around the patterns that have going on. We applied the same thing to the Semantic Web. Early on, when folks were trying to get their minds around, what is this Linked Data concept about providing some open source, some reference code that allowed people that when addition to the specifications to test it out, to find out what work, what didn't, how this related to their particular current development activities, their business discussions, et cetera, et cetera. We're applying the same methodology here. Providing some tools and translation services that are open, free for allowing folks to basically experiment with this marked to marker Linked Data pipeline, mechanisms for validating this information, mechanism for connecting it, and really ways in which you could bring together the developing community, and the implementers community, and the user community, and the business community in a shared way through not just discussing standards or committees, but seeing what it looks like on the screen or seeing what it looks like in terms of traditional, you know, applications. One of the things that I spend a lot of time at ALA was sort meeting with various different groups that were part of those early acronyms and one of the common themes that's came out in that is what we'd like is just to sit at the table and what was very clear is that nobody could agree to what the table was. So everybody wants to sit at the table, but no one sort of could define what that table was and we have a very similar, you know, problem in any sort of organizations that's trying to deal with standards here and having some baseline infrastructure and some baseline frameworks as critical for accelerating that. So while there's been a lot of efforts that's been going on there hasn't been a lot commonality among that effort and what we're trying to achieve here in this engagement is really some common ground to accelerate that effort. It's not necessarily to replace one for something else, it's really to how to accelerate the dialog and provide some substrates in which we can have, you know, an enjoyable conversation and a productive conversation. I remembering running a lot of different early working groups in the space and I would make the point of [inaudible] we're going to table this discussion. And in the US they thought, "Okay, we're not going to talk about it." But in Europe they thought, "Let's talk about it", right. I mean, we're still-- because these concepts mean different things and that's what we're dealing with right now. Not only do we have different models, we still have different concepts. And having some-- you know, some common models, some common code, some common experience that we can begin to come together at the sit of the table is really what we're trying to achieve here. So if we were to basically say, "Is this engagement successful?", one way I would basically characterize this is have we provided at the end of the day enough of a structure of what that table is that we can begin to have in a more effective dialog than it's had to date. And I think we can. Having done, I did-- I wasn't sure about this about a month ago, but-- 'cause there's a lot of stuff here. But I do believe we can. And from the discussions that we've had, frankly, even since ALA and some of the implementations discussion that we've already-- that have already started in various national libraries. I actually do believe that not only can we have common grounds for accelerating this conversation, but a very effective process and procedure for supporting community experiment and dialog and evolution of this that allows for some minor course corrections in various different groups but will benefit all by having this common table in which to have that dialogue. So, how-- what do we do now, how can you help? You know, right now, there is a lot of confusion still about what Linked Data is. I mean, you know, I've given you sort of two slides that tried to sort of, you know, collapse that. That the answer is somewhere between those two slides and everything else that basically is going on. But, you know, look into this, try to understand it, what's going to be most effective though in any community that's understanding is the nomenclature, the concepts, the stories that this community can share with each other to understand it. Understanding what Linked Data, what the pharmaceutical companies are applying Linked Data or the commercial companies are applying Linked Data or the public health organizations that are doing it, is helpful to understand sort of market trends, but less helpful in understanding the subtleties that we care about as a community. So, what-- not only can you basically focus on what it is now and learn everything you can, but more effectively, finding those areas where you can share it among your colleagues and discuss this and find out how it impacts, you know, end user experience to cataloguing, to-- if you're in-- you know, if you're in the issue of building systems, to licenses. You know, all of these different things require different perspectives. And so understanding more and more of how these technologies apply to your specific area is what you can do right now. I do suggest you look outside of the library community for some answers and look around at what others are doing. These particular architectures, for example, allow for greater level of granularity. What is the impact of that in terms of licensing? What is the impact of that in terms of caching and optimization? What is the impact of that in terms of user experience? Different communities are leading the way on different aspects of these. So, if there's a creative tension of understanding how it relates to you, but also I encourage you to look outside and try to basically become a bridge for those other communities that are leading the way in other aspects of this. We expect to have a final draft-- again, this is the draft, but a final version of the draft-- supporting tools, supporting interfaces. These are going to be still some low-level tools. They're not going to be cataloguing interfaces in which you can turn a cataloguing team on. But some low-level tools that helped you understand more of what the impact of these technologies and social directions are enabling. In August, September, you know, experiment with this and start to prep your teams. Now for experimenting with this, give them permission to experiment with this and participate. Make sure this part of your criteria, you know, participate in this, you know. You are shaping your own future in this. So now, read as much as you can, ask as many questions as you can. And in October, September be prepared to actively engage in terms of trying things out, telling us what it works, what doesn't, and being part of this larger discussion once this table is in place. Thank you very much. [ Applause ] I'm not sure what happens next. [Laughter] >> Now you open the floor for questions. >> Okay. [ Pause ] There's-- I'm like so blinded up here. So, I think, I see a hand waving in the back. I'm sorry. [ Inaudible Remark ] >> I think you can get to sort of crosswalk-- crosswalks to crosswalks. I mean, you've brought up Dublin Core, you know. I don't think that's-- I don't think anyone-- you know, I was one of the co-founders of that initiative. You know, that and other initiatives that I've been part of it are in the back of my mind as I start to look at this. I mean, a key part of this is going to be how best really to harmonize a wide range of different efforts. Sustainability can be measured on a couple of different things. One is, you know, how does this effort relate to other efforts, and certainly, being very clear on how those relate is an important part of this. In a Linked Data or RDF manner, part of those relationships are defined by giving a nod or giving a-- being very explicit in terms of vocabularies. So what you mean by title, for example, as defined by Dublin Core, or what you mean by subject as defined by Dublin Core, or what you mean by expression as defined by FRBR. So in a Linked Data model, one can define a very flexible model that picks and chooses different vocabularies from different communities to create some reflecting model that gives a nod to each of these different communities that are basically expressing that. And if you look, for example, into the British National Library's efforts, that's exactly what they did. Every description that they used gives a nod to how these crosswalks are formed by connecting it into other particular vocabularies. That poses a very interesting and somewhat concerning sustainability model when you start depending on the vocabularies from other communities in terms of managing their namespaces. So the flexibility that we've enabled in terms of the standards has a certain cost in terms of the sustainability and longevity aspects of this. Because now, you're relying on external organizations to define very clear policies on their vocabulary, their management, the longevity of their organization whether, you know, some namespace will still be around in 1, 2, or in the case of MARC, 50 years. So when you look at this from "how do you design this for the next 50 years perspective", and you look at this from sort of a long-term sustainability perspective, the initial draft that we're talking about very clearly, basically suggests in this particular case of this marker model, there'll be one namespace. And those namespaces will in fact map and connect and leverage the Linked Data models for how these connect as new vocabularies become online, as existing vocabularies become sunsetted as new ways in which these terms basically relate. So there are going to be, frankly, some additional social curation aspects of this that are going to come alongside of any kind of long-term sustainable vocabulary development. And you see these in different communities right now. Dublin Core has an editorial board that basically talks about how terms might, you know, be added to this. But I think you will also want a way in which you can start basically formally defining these relations to other communities as well. Again, the Linked Data architecture doesn't require such a community or a committee or a group to be formed, but I do think you will want some kind of, you know, that helps basically define the relationships that are important to this community to other vocabularies and our initiatives that are important to this community. So from a sustainability standpoint of this, you know, and you look at this from a long-term, you have to put in place certain modeling paradigms and modeling practices that looked at this from a long-term persistent standpoint. And when you talk-- start talking about persistence, especially in the context of this kind of models, this isn't a technology issue at all. It's a social and organizational commitment. And so the vocabularies that you map to or the vocabularies that you define have to be reflected of the social and organizational commitments standing behind the persistence of those identifiers in which you imprint. Now, that's not as big of an issue with this community. This community understands long term persistence and it understands-- and in the context of things like id.loc.gov, it's now giving identifiers, you know, these first class objects with commitments that those identifiers aren't going to go away. But those kinds of discussions, those policy discussions and those persistent discussions around those identifiers are an important part of that sustainability discussion. And one that have to be part of not just how we model this, but how we manage the identifiers that are associated with our model moving forward, and how those identifiers relate to other communities and other efforts that are going forward. So specifically, the Dublin Core or specifically to FRBR, or specifically to different kinds of things, we're leaving little bread crumbs of how these things actually relate as a pattern but they will have to be filled out, and vetted, and discussed, you know, as the community starts to basically evolve and extend it. There're now three different questions. I'm just not sure who asked first. I think you did, but I'm not sure. >> I'll take it. >> You take it, okay. But I need-- you two are next. >> Okay, sir, yes. >> All right. What you were just talking about when you said there now, you as a community, know the value of this and how to experience trying to implement it. Now the way is already-- and so just-- doesn't quite to be a really natural sort of-- >> I didn't say natural. [Laughter] >> Well, I-- >> But there's more familiarity here than, say, other communities. That's what I'm trying to say. Yes? >> All right. One of the difficulties that this community has have and it's only within, I would say, the last 15, 20 years has this become the [inaudible] is interfacing with everything or everybody or outside of the community. Okay, community, business community, finance community, social organizations, government organizations, and so on. And one of the reasons for that is because we defined our needs in such a way that they just [inaudible] us, all right? But the same is true to the elite communities, okay? Business communities, government communities, they all have their own vocabulary-- >> Absolutely. >> -- and their own way of using them, okay. From what-- So what I'm curious about is how the whole concept of Linked Data or is it [inaudible] Linked Data and the Semantic Web somehow going to try merge those communities or are these communities going to [inaudible] take these tools [inaudible] and continue to live out their [inaudible]? >> I think the answer is yes. I mean-- And I don't mean that in any sort of belittling way. But the folks that got together to basically discuss leveraging the Semantic Web were sort of-- I would characterize them roughly in the sort of two camps. One was we're no longer interested in sort of proprietary solutions for managing our data, you know. I mean, if anyone has gone from one data management system to another, the cost, the energy, the cost is a very challenging thing. So if we could come up with the-- an open non-proprietary way of reflecting our data so that we could reduce those costs, that would be great. The other was we see value in being able to sort of connect into other communities to reduce social and technical costs. And the library community is not alone in terms of building up their own, you know, vocabularies for describing assets as you mentioned, even sort of names for those things, right. You know, OPAC, it was very difficult to sort of explain to folks, OPAC is a database. It's okay. You know, I mean, it was-- you know, people come up with different names for these different things and they're not exactly 100 percent equivalent but when you help bridge the community, it's helpful to sort of make those connections. You will see communities still simply using these technologies to accelerate within themselves, new ways in which to share information. That's all right. You're also seeing communities that have-- I would say closer affinity starting to connect in ways that make more sense. Would I see the Semantic Web and Linked Data technologies providing a substrate between libraries in the business community? Technically, yes. Socially, I doubt it. What I do suggest, and what I do suspect is that they're going to find a straighter-- a stronger affinity towards nearby organizations or nearby communities that have been very difficult to integrate today, museum, the archival, the-- those kinds of organizations in which, yes, we technically have different ways in which these organizations think about these things. But there's enough common substrate and enough common overlap that we'll start to realize the value proposition of leveraging this. I think what I like about this from the standpoint of how we focus on the-- these technologies was by not over thinking in terms of how they were going to be used too much. And that was sort of intentional. When we originally started working on HTML, for example, it was conscious effort to sort of say things like, you know, we're not smart enough to know how this stuff is going to be used. And that was actually a pretty novel perspective from the various standards efforts that I was involved in at the time. We try to basically make that point here as well. I do think this will accelerate the connectivity across communities. I do think that it will accelerate the connectivity within a community. It might just create better stovepipes in many communities quite frankly. But those communities will be able to more effectively communicated and share information within those stovepipes. And we're seeing that a lot. But where I think the benefits are in this particular community is by simply providing the connectivity to near by ones. And I think that those small steps are the one that we should basically be thinking about and focusing on. Now, there was a gentleman here. And then-- yes, I'm sorry, I'm-- your-- all I see is like the sun. [Laughter] >> I had a very similar kind of thought that [inaudible] understanding and we're talking about specifically, I realized that today's presentation is very [inaudible] for a library community. And you're almost [inaudible] required in a sense because this is a very strong community [inaudible] understanding [inaudible]. >> That's right. >> And pushing for it. And I worked in a museum environment on [inaudible] Smithsonian. >> Um-hmm. >> And one of the things that I have realized is that the archives and the museum community have a lot of complementary materials that work very well with the library community. However, the data standards in the lines that are very different. And therefore, when it comes to implementation and experimentation, say, you reached a huge technical issue, because of the way that they [inaudible] the information and catalog and so on. I worked with those museum folks and one thing I hear very often is that they understand how strong and what's [inaudible] on that where-- that there are a lot because they were developed for bibliographic type of information as much as they realized [inaudible] standard, they simply cannot [inaudible] into it-- >> That's right. >> And therefore, while we're doing this Linked Data movement here, my specific question is how broadly are you thinking to incorporate data that's beyond the traditional bibliographic type of data to move into beyond, you know, the library archives business financial community to make sure that it is not a limited structure that makes other communities a bigger challenge [inaudible] that? >> I'll answer that in sort of two ways. One is, I'm always thinking of that. Seriously, that's my job. The other is, I'm very focused on shaping that point in the context of a very short-term engagement to help this community get one step closer to where it needs to go. But, you know, from my perspective, really in Dublin Core, and then at the Semantic Web Initiative and now at Zepheira working with lots of different stakeholders, the thing that bound all of those different communities together wasn't being part of a library or being part of a museum, or being a part of a pharmaceutical organization or being part of public health group or being part of business or whatever. It was, how do we effectively stitch together this heterogeneous data in a way that we can use it outside of the traditional bounds in which it was created? So, when I'm thinking of these kinds of models and when I'm looking at these particular approaches, I am really trying to look at this from what is the simplest thing that is possible here, and no more. Because when you start basically increasing the complexity of this, you build higher and higher bridges here that make it more difficult to share. And this community has such valuable assets to share and it has built such high barriers to share them with. That if we can break that down, more kids will see Peepshows, right? I mean, they will see the kinds of amazing special collections that we have and they will be able to contextualize those special collections in new stories which is what the museum community does. And they will be able to associate the devices that were needed to render the stories in terms of the digital preservation side of this. But what we're doing is giving the small building blocks to basically do this with. So, you know, the models that we're working on right on, started of-- I would say incredibly complex. And what we've been doing in terms of a Linked Data model is really trying to sort of break down these down as simple as you possibly can to the very primitive levels, because it is our hope that we can build them up very quickly to support the sort of discussion of this community. But build them up as well to support, you know, the museum community. Now, will there be a full support? I'm not suggesting there's one model to rule them all here. But if we can start exposing some of the primitive-- these marker assets, these core components that can begin to be connected into these other communities. That's a huge step in terms of achieving at least some degree of interoperability and some degree linkage that weren't able to do before. We brought up the word crosswalk before. I think everyone in this room has been part of a crosswalk effort. And I'll be willing to bet a dollar that many of us had been part of a crosswalk effort that has reimplement another one's crosswalk, or built off someone else's crosswalk and crosswalk, and crosswalk. But this isn't recorded in any way in which we as a community can take advantage of. We do best by word of mouth. And then we take these crosswalks and we code them in some sort of business logic that's specific to our application. And it gets lost. By simply surfacing these identifiers so we can start to point at this, we can use the web as an infrastructure here. And for all of the efforts and all of the aspects in which it's, you know, for the good and bad. This gentleman here basically talked about, you know, 15 years ago realizing that we were sort of building at only for our self. Every community that faced the web realized that. It'd be an interesting historical aspect from the social side of what the web did as in terms of impact, because it forced us as communities to start to look beyond our boundaries in ways that we weren't-- that we didn't really-- we're convenient before it. Now we have to. So to the extent that we can reduce the simplicity of the model that supports this particular community's needs but with an eye towards these other communities and start to begin to identify patterns to surface some of the data that we have so that other communities can build off of it. Yeah, we're constantly thinking about how to basically build these bridges. Because I tell you something, patrons don't see them. My children don't think of museums and libraries and archives a separate institution. They just want this data. And if you see any 4-year-old in an iPad right now, they're getting pretty good at finding it and pretty good at manipulating it and pretty good of distilling it in ways in which are just absolutely remarkable. And I can back this up with lots of different stories and lots of different sort of research efforts. But the barriers that we saw that the institutional communities that we know to be too real aren't the same that our children are growing up thinking they are. So, all right, this gentlemen here, sorry. [ Inaudible Remark ] So that's a great question, I can tell you from a contractual standpoint up to that point. After that point, I can't. But I-- this is what I hope. One of the things that I think this community could benefit from and one of the things that we drafted is really just the-- it was an important part of-- and it was some painful, but it was important part of the web consortium which was, you know, really just a guideline, a procedure guideline for how to, you know, support community discourse. You know, somebody registers a complaint or somebody registers-- it's a process document so identifying a very simple process for how this evolution is going to occur, how you can participate, how input can be received and how input can be responded to. My hope is that by the end of September, we will have a first working draft of the model, we will have some early implementers-- sorry-- that are sort testing this out and providing feed back. We will have some code that people can see, you know, in various different Linked Data browsers or different interfaces, you know, conceptually what this might provide us as a community and a process for how to support that dialogue moving forward. And I think with those ingredients, you know-- sorry, also one important thing. Really, just at least from our prospective, a roadmap of how to move forward. And the key pain points that still need to be addressed on the social side of it, the technical side of it, the education side of it, et cetera. Those are the tools that our organization is contracted to build and provide, how we as a community can start using those tools to build this table and to make it happen. Beyond that, I think, the two people in front of you will be the best ones to answer. But I will say that I would loved to be part of that because it has been a really enjoyable experience taking a lot of this different lessons learned from this different communities and coming back to library community where it all started. I mean a lot of people don't realize that the library community had such a huge impact in terms of shaping both the web as well as the Semantic Web. And it's been a wonderful to sort of, you know, reconnect and sort of, you know, take advantage of the lessons learned all the way along the process. So I expect to be part of that table, so. Yes? >> One of the challenges as I'm sure you realize in the next nine months and so, the library community is implementing our data. And there's a lot of training going on, there's a lot of study. And the USRDA test demonstrated the kind of actually surface from the data, the need for a different format. RDA cannot be fully realized in MARC. So that much I think it leads to [inaudible]. But can you flip that around and say, "Okay, yes, we needed in format to implement our RDA, but does the implementation of Linked Data and the Semantic Web also need RDA, how does RDA help move us in that direction in that regard? >> Well, that's a good question. So part of this is a serialization, I think we need to be very clear about at least what we're proposing. And, you know, one can talk about abstract models, but what's very important is be able to serialize that from one machine to another. If you don't have some kind of encoding format, you really only dealing with abstract models. That serialization, that-- it can be multiple serializations but it has to be at least one, okay. Does Linked Data need RDA? No. I mean there're a lot of different communities that are using Linked Data right now that have never heard of RDA. And for all of those different communities outside of the library community, that's just an example, right. I think RDA has helped make in this community Linked Data possible because it's giving us sort of the conceptual scaffolding to start having, you know, that dialogue to breaking up in our minds, you know, the notion of work and manifestations and expression and item. I will say that if we go back to these two questions about how to bridge the different communities, if I am trying to explain to a community outside of this, please give a review on this and I have to explain-- okay, now you're reviewing the work, not the manifestation, the expression of the item. We're creating a pretty high barrier for that outside community to start connecting into. So I would say that the RDA work has done a lot in terms of giving us the conceptual scaffolding to start thinking about a Linked Data model. I think timing has been perfect. The Linked Data technologies have matured to the point of supporting, you know, a lot of different experimentation. I would say that looking at the RDA work and looking at the works going on the National Library of Germany, works are going on in British Library, works that's going on in various different national libraries, Diet Library of Japan, but also the work that's going on, you know, with schema.org and Microsoft and Google and Yahoo!. The intersecting model is not going to be so expressive. It's going to be much simpler. And I think an important part of this effort is to be very clear of how it relates to all of these additional things. But we have to balance that creative tension in terms of expressiveness and richness with respect to, you know, outside contribution and collaboration. We can build very rich models that can-- you know, that can-- that they're difficult to contribute to or very simple models that are easy to contribute to. And I think the answer is really in the simplest model that allows us to do the simple things but can be extended to support the complex things we need. And if we don't think about that from that framework, I fear that we're going to create too much of a barrier for that-- for those communities outside of ours to basically leverage our valuable content. I think I'm getting the nod that that's it. [Laughter] >> It is 11:30, we are very grateful for your turnout and we want to give another hand to Eric Miller. >> Thank you. [ Pause ] >> And as I said earlier, we have set aside some time at 1:30 today in this room if there's further discussion and dialogue you wish to have with Eric, please come back. If not, we will find ways to use Eric's time. >> I got lots of things I want to do with you guys. So I mean-- >> So thank you again and come back at 1:30 to have some [inaudible] discussions. >> This has been a presentation of the Library of Congress.