>> From the Library of Congress in Washington, D.C. [ Silence ] >> Sally Hart McCallum: Good morning and thank all of you for coming to-- for the session on BIBFRAME. We've been having one for each ALA the last several, last couple of years and we really appreciate your coming. BIBFRAME is the short for Bibliographic Framework Initiative. You will, however, most speakers most slides, will say BIBFRAME so I want you to be sure and not that that's what we are talking about. Our first speaker will be Roberta Shaffer. She's the Associate Librarian for Library Services at the Library of Congress and she's going to be talking about LC's plans and how BIBFRAME fits into the picture and how it's central to the picture. >> Roberta Shaffer: Thank you Sally and good morning everyone. So, I think this audience knows very well that while we are called the Library of Congress we really serve curious, concerned, creative, commercially minded citizens everywhere and our goal is to serve them everywhere and anywhere and our initiatives basically move in that direction we believe. So, it is no accident that BIBFRAME is kind of at the center of the ribbons and really is in many ways serving as the fulcrum. But what my role is this morning is to just very quickly put the bibliographic initiative into the context of the other initiatives that are going on at the Library and ask you to keep in mind this fact that we want to serve those citizens anywhere and everywhere and give them the ability to be informed, hopefully inspired to spark imagination and innovation, which we think is really the role of a National Library in a life and mind of a nation. The first is Knowledge Navigators and Human Capital Planning and it's-- the Knowledge Navigators part of that is a very ambitious initiative currently underway at the Library of Congress to reach out to emerging scholars from all different disciplines all over the country and the world and give them the opportunity to come to the library and work side by side with our experts. It is really designed to be in many ways a mutual mentoring experience so that we can learn about things that are happening outside of the walls. They're not unscalable walls but they're walls physically and virtually at the Library of Congress. We want to know what's happening outside and be able to exploit that knowledge internally and we also want to share the practices that we pursue and the standards that we advocate so that the world can be aware of those. And people can come and be in residence with us anywhere from one week to nine months. So, if you are so inclined and here's the catch, have some change in your pocket, because right now we don't have the money to support it, although we're working very hard to try to find funding, please see anybody who's wearing a Library of Congress badge at this conference because we really do want to know about you and people you know and engage them in that program. The Human Capital Planning piece event is an internal program for Library staff to go out and learn about things that are happening at other libraries, commercial institutions, really anywhere where there's something relevant happening that the library can benefit from. And so again, if your institution is doing something that you think is relevant to the Library of Congress and you would like to host one of us, in this case we will come with our own money. So, you won't have to pay for us. So, please think about both of those things very seriously because we really want to pursue those in a much more active way than we have in the past. The next is basically a ribbon that talks about any number of global, there are a number of national, but truly the best way to describe them is global, global collaborative initiatives; initiatives where the Library of Congress may be in a leadership role or we may be merely in a participatory role. And again, I'm putting-- I'm giving you lots of homework for an early Sunday morning event. But please, if there are collaborations going on around the world that you think the Library of Congress needs to be or should be at the table, please let us know. We are joining many more. We have in the last year and we are interested in participating in any number of collaborative projects, programs that are happening around the world. The next slide is the slide is that I know I have to change the name on it, because as a panelist who's one of my heroes you'll hear from shortly, Eric Miller pointed out to me just a few days ago, by looking at that ribbon, social media, big data and non-traditional sources I give away my age immediately. And so, my vanity is-- has been overwhelmed here and we're going to change that to a better term. I haven't decided yet, but I think I'm going to start calling journals and monographs and paper non-traditional resources and then I'll feel much more cool and with it. But nonetheless, I think you get what we're talking about. There is no way to not recognize today that the coral reef of resources that must be consulted to do any kind of complete research is growing. And it's very difficult. It's actually ridiculous to put a traditional versus non-traditional label or bucket on these resources. But the important thing to keep in mind in this regard in terms of BIBFRAME is the underlying need to connect all of that information and this whole new movement that you've been reading about in all the business press about the internet of things. So we have been living in the internet of the consumer or as we at the Library say it, it makes us feel better, the internet of individuals. And now we're moving to the internet of industry, which is where sensors, with an S, or machines feed direct information to each other and outcomes occur. And we need to be able to marry up all that information in an integrated and interoperable way so that the internet of things truly is in terms of description of information the marriage of the internet of individuals and the internet of industry. And that's why BIBFRAME is so exciting to us and really is the fulcrum or the center of our thinking and of our ribbon chart that you're looking at; it seeks to do exactly that. Part and parcel then of having the ability to describe information, the internet of things, we then hope at the Library of Congress to do a much better job in the sense of our aggression in this area with ingesting digital content and also with converting it. So, we have an awful lot of material that needs to be converting it. So, we have an awful lot of material that needs to be converted but let me add just a small footnote there and that is that we are not sure in terms of our own collections and maybe you can give me some of your insights on this. When exactly the tipping point will occur so that we will know when the analog is really the minority of the collection and is now really being subordinated to the needs of the digital. And we actually, and I don't think this is at all pessimistic and I think it's rather optimistic, we're not seeing that in terms of our own collections; I'm not talking about yours for probably another generation. So, when we sit around at the library and talk about it, we look at something like 20-30, but nonetheless we'll have to live in both of these worlds then for an entire generation and that makes BIBFRAME even more critical now then it might be if we were having this conversation in 30 years. The next one is our efforts to really redo, redesign our website. So, we are going to be used-- be using faceted search and we are going to be having a much more fully integrated website. In the past it mattered far too much which door or window you used to enter our institution. Your results could vary greatly merely by whether you came in the kitchen door or the back door or the front door or the window over the front door and that cannot be the case. You have to be able to approach us in a way that you are secure that you're getting the same search no matter which door you enter. And that is not a small order for an institution like the Library of Congress. And then last but not least, the one that is perhaps the most, the most pressing now is creating the Center of Knowledge and this is a physical endeavor. This is an endeavor to consolidate, to bring together a number of our reading rooms with related disciplines. We are seeing much more that the world of research is what we call transdisciplinary. So the second half of the 20th century probably gave rise to a very multidisciplinary world and I saw there were some beautiful quilts for auction so I'm going to use this analogy. That world looked like a quilt where you had many things connected by threads between them and that was how you looked at a problem, squares with threads connecting them. Today, we think that the way problems are, it's much better to think of them in terms of a blanket with little micro threads that could come from any discipline, that could use any research methodology, but that may share a number of the technological apps that enable you to exploit them and so we are calling that a transdisciplinary world. And because of that we believe we need to bring together our staff expertise, our collections and our services in a much more centralized way than having things spread over twenty three reading rooms in three or four or five buildings depending on how you count our campus. So, with that I know you're all more interested in really the meat and potatoes of BIBFRAME. But I wanted to just take the beginning segment this morning to put this all in the context for you of Library of Congress initiatives and to punctuate for you how critical BIBFRAME is to the success of all the other ribbons. So, thank you so much and enjoy the conference. [ Applause ] >> Sally Hart McCallum: Okay and our next speaker is Eric Miller. He's the President of Zepheria the consulting company-- an engineering company that has been working with the Library of Congress on BIBFRAME and he's going to discuss the issues relating to accommodating specific rule sets such as RDA and also profiles. >> Eric Miller: Thank you Roberta for setting that context. I think one of the key aspects that we have to or at least we are constantly reminded of is the diversity and the scope and the breadth of what we're trying to accomplish here. And the value proposition of not creating a simple one off solution, but rather a much more flexible framework that accommodates the needs of the individual requirements but in a way that connects these things in a holistic and value-add sort of way. I'm going take a few steps back and talk a little bit about BIBFRAME in that context. So at a certain level I'm going to be repeating myself from previous talks, but really with an emphasis of trying to sort of connect this to a critical piece of the puzzle that we've seen in a variety of early stages in other parts of the web, but we think is a key part in allowing different communities in which we all represent the ability of describing resources in a way that makes sense to them but project this in this BIBFRAME context that allows those connections and that serendipitous interoperability that Roberta was sort of talking about that goes across all of those different swim lanes. The initial BIBFRAME draft states a very high level objective in terms of what the aims and goals of the BIBFRAME initiative are but I want to sort of annotate these slightly. In a context of the bibliographic environment what we are talking about rather legacy or non-legacy or traditional or non-traditional really something that's robust enough to address the incredible value, the cultural heritage that we have been curating for the past forty years, but also the ability of curating resources and the cultural heritage that in essence will be for the next forty plus years as well. So, we're not just talking about the physical materials by any means that we have in current OPACs or collections, but those that we anticipate the Library, the museum, the archival communities being part of curating as we move forward as a big web of data and as our patrons come at the content in which we care about from a wide range of different perspectives. It mentions the environment for libraries but that's really sort of a shorthand for this entire memory organization space. Libraries, museums, archives, publishers, our patrons don't make these distinctions of the physical boundaries in which our institutions have traditionally found themselves. And we're looking for frameworks and gateways and ways in which the-- however the patron basically stumbles in their inquiry can be easily connected and joined with other parts of these memory organizations in a way that really allows that kind of exploration and knowledge and thus knowledge sharing to occur in a very seamless manner. And makes-- it takes really advantage of this network, not single centralized solutions but really the sort of long tail of the web as a platform, not thinking about just publishing on a website but being part of the very fabric of the web itself and leveraging the platform as a technology and as resilient architecture to allow people to contribute to it and add value to it and collectively we take advantage of that. So, the BIBFRAME model defines at a very high level set of web control points to enable this kind of effective sharing and it provides a means in which we can add value to this in a local context be it reviews or annotations but recognize or record it in a global framework that we can move from one local context to another. And that's a critical part for starting to engage in essence the power of the community, the power of we not just in the library community but all of our patrons that might be part of this. The little squiggly think that you see on the screen is a cap shot. It's an indicator to the machine, an application. The person that's in fact typing it in is in fact a person. It's not a machine trying to basically hack the system. All of us have seen these. All of us when we see these grimace. Occasionally we grimace three or four times because we type them in wrong. But the person invented this system to distinguish the difference between a human and a machine estimates that basically we grimace and type these in 20 million times a day. And that is a very horrible use of our 10 seconds. But what he also managed to figure out and what's actually been quite effective, if you take a look at a lot of the physical material that we as a culture have created, newspapers, written manuscripts, old books, the ability of doing optical character recognition on that is very difficult. We don't have the right training corpus, the paper's yellowed. What you can do is basically identify those bounding boxes around those smudged words. You can inject them into these caption systems and 20 million times a day every time we're grimacing in essence what we're doing is translating that image into words, reassembling those words collectively and through this particular process, for example, we collectively have recreated the "New York Times." So, the entire image archive of the "New York Times" from the 1850's to the 1980's when they basically cut over to electronic form was in physical image scanned TIFF bits. This has been injected in the system. Collectively we have reconstructed that into an incredibly important valuable link data context in which scholars are using to get a better understanding of what we were buying and selling at the turn of the century, what was fashionable in terms of what folks wore or what political discussions or what concerns society at the time. In eight months while collectively grimacing on a page managed to sort of reconstruct this. And part of that is not only being able to sort of reconstruct the words, but being able to reassemble those components into reusable, shapeable new kinds of building blocks. The underlying standards that we've been developing at the web consortia and those that are sort of starting to underpin a variety of different industries are based on this notion of recombinant data, this ability to describe small amounts of data that click together like small building blocks and make that same satisfying click when you snap them together to create new and interesting results. I realized unfortunately, but perhaps fortunately for you, this will probably be my last RDF talk with Lego's, because for the past 12 years I've been doing this with my children, but as I packed them off to the grandparents to come to this conference today I was reminded of looking in the trunk and all of the Lego's that used to be there have been replaced with a tremendous amount of Nerf guns. [laughter] All of which are horribly non-inoperable and thus I am out of an analogy. But, what was quite-- what was quite impressive is when my children or, in this particular case a couple of individuals with way too much time on their hands, managed to take these small little Lego's from different collections and reassemble them into amazingly interesting new shapes. So, you purchased Lego's; I did many times based on different price points, different themes, different sizes but all of the Lego's that I've purchased and all of the Lego's that we've had for the past 50 years snap together. So, even though they were basically bought and sold in particular boxes, when they got into my house they were distributed evenly on the floor, but then recombined in new and interesting shapes. But at a certain level the reason that they're able to do this is because of the engineering tolerances of those particular Lego's, which are on that-- which are on par of that of an internal combustion engine. So, even though they are simple and colorful and make interesting crunchy noises when you step on them, the tolerances of these things are highly engineered. And because of those tolerances and because of those small atomic building blocks the standards that emulate this are exactly the same way. They're very small building blocks with very high tolerances that allow small bits of data to be joined together in a new and interesting way. One of the things that we're focusing in the context of BIBFRAME is building up from those small building blocks larger building blocks that we as a community basically care about, people, places, topics, organizations, books, movies, gene sequences. You don't think we care about them but we do. Lots of interesting things that we think we have right now in our collections, but lots of interesting things as well that we will need going forward; so, moving up from small atomic building blocks, to larger building blocks that we can start to assemble in new and interesting ways. And the notion of a community profile is taking those building blocks and reassembling them based on shapes that we care about. Taking those small atomic building blocks, those larger BIBFRAME building blocks and assembling them in shapes that different communities are interested in sharing among themselves. And we don't need to basically in advance say all of the different profiles that might exist or say what's one profile that we care about or a profile that we don't. Rather we're allowing a community to create these shapes, to create these concepts, to create these components that make sense to them. Now in my child's mind I don't know what the Jacques Cousteau you know and the Knights of the Round Table horse have in common. But when he puts the Jacques Cousteau on top of the Knights of the Round Table horse and marches it around to him it's a seahorse. That new notion of taking two separate shapes to create a third new shape that now has a new meeting now makes perfect sense to him. And so what we're trying to accomplish here is not only allowing different communities to define their own particular shapes, but then recombine these in a way and reconnect these in a way that basically shows increased value in new ways in which we can deliver new value to our patrons to our scientists, to those that come to us looking for answers. So, the BIBFRAME model is designed to provide a very lightweight flexible framework for accommodating the needs of a wide range of different communities. RDA is a critical one of these but there are many. And the ability of basically allowing now a common framework in a way of defining a profile that those different communities can reflect into is a key part of what we're trying to achieve with these community profiles. So, on the left-hand side is a very object relational model based on FRBR and the WEMI structure. On the right-hand side is a very graph-based model based on BIBFRAME. And what we're looking at is a way of basically projecting a community's needs in terms of an object inheritance model into a graph-based model. In essence a way of basically projecting squares into circles. The key isn't to basically say the number of squares on the left-hand side are good or bad or what you need is a triangle or maybe a squiggly would be fine; it's not to basically say one community's needs outweigh another's. It's a way of basically saying that if a community can agree on a particular cataloging practice like this community has in the context of RDA, how do we basically provide a way in which we can project that into a model that we can then transmit this across different kinds of systems? And this is very much sort of a work in progress. There should have been a big draft. I apologize for not putting big blinky you know lots of work still to come, but part of what the BIBFRAME.org pipeline is working on at the moment is taking you know existing MARC records and MARC records that also basically are reflected in RDA and trying to figure out those kind of projections into the BIBFRAME model and then using some general tools, a common query language, in this particular case a simple RDF link data navigator, the common framework in which to basically navigate around with this. This is a VRA record. Actually it's an image of a VRA record. BIBFRAME we get a pedantic that way, but this is an image of a VRA record. We can basically do the same thing with VRA. In this particular case the community has two different squares. It has a very tight relationship. It's not an object inheritance relationship; it's a different kind of relationship. In that community there are works and then there are images that are associated with works. And we can basically again re-project that particular structure back into a grab-based structure in terms of BIBFRAME and in essence take advantage of a different pipeline that in this particular case shows the merging of a bibliographic resource that was done you know based on the MARC RDA through a BIBFRAME profile and a VRA record, which is in a completely different structure into a BIBFRAME profile and thus basically connect these together in the same interface using the same query languages around the same control points in which they share. All of this is a work in progress. All of this is very drafty. But, we've been working hard in terms of building off of a lot of very interesting work that's happened in the Dublin Core community in the commercial industries and a handful of different groups that have been exploring this kind of profiling mechanism and I'll just make the quick point that it actually took about three hours to get this far. So, even though it's not right and even though it's not there yet, and even though we need further help from the experts in both of these communities to do this, we've got some very interesting building blocks in terms of starting to take those knowledge from these different communities and project them in a common way that make this kind of snapping possible. And it doesn't need to happen for every single record every single time. These profiles are identifiable. They're part of the web and they could be used as gateways to basically share this. The web for those that remember FTP gopher archie, yes, you know there are very few hands that keep basically raising here, yes thank you. I have gone many times to talks where there is no-- the people basically all you here is crickets when you do this. But the web wrapped and exposed those protocols so that those basically could instantly be part of the web. That was a major way of basically starting to integratedata from these different systems. CGI-Bin, horrible name, but really good idea. Lots of native databases, lots of native structures, gateways to this information that we could start basically talking about; these profiles represent the CGI-Bin for these different kinds of communities in which this data can be exposed as part of a web. And using BIBFRAME as a set of reusable building blocks that these different datasets and these different communities could be projected into, that local meaning projected into these common primitives, allow us the key aspects in which Roberta was talking about connecting data. Not just data that we know now, but data that we might be having to deal with in the future and doing it in a common standard way that reduces the technical and social costs across all of us for achieving. There's a long history and discussion in which we're building off of here. There are a lot of details and I'm happy to basically talk about the implementation aspects perhaps after the call or after this presentation or at another time, but at a high level the results are incredibly promising. And we expect as we start locking down more and more of the details and start putting up more and more of these kind of profiles, not to basically say this is the definitive way, but more as a way to start to illustrate the patterns. The BIBFRAME.org space is a way in which we're starting to basically disseminate. So, if you're interested keep an eye out there, more to come. Thank you very much. [applause] >> Sally Hart McCallum: Now we're going to hear from two what we call community experimenters and in the about a month or a couple of months ago I put out on the list serv, the BIBFRAME list serv a request of who's experimenting out there with this so that we could see if we could identify some people who might speak this morning. And so I did, the first will be Jeremy Nelson. He's the Metadata Librarian at Colorado College and he's going to describe how they're incorporating elements of BIBFRAME into their developmental work. And he'll be followed by Vinod Chachra, President of VTLS, who's also going to describe experiments that their institution is engaged in. >> Jeremy Nelson: Alright thank you Sally. So, as she mentioned I'm the Metadata and Systems Librarian at Colorado College. Before I go and continue with this presentation this presentation is a live website that's available online at tuttdemo dot coloradocollege dot edu forward slash ala 2013. If you go to the root directory of that tuttdemo it'll bring up our demo instance of our Redis datastore. So, you can actually play around later after this. So, I want to talk about experimenting with BIBFRAME and Redis. So, what I've-- what we've been working on is something called the Redis Library Services platform. And the Redis Library Services platform is made of two open source projects, the Aristotle library apps and the BIP print datastore. So, Redis is a NoSQL, which means it's not based on SQL and it's what's called a key value data structure server. It's very different from other types of data structures in that you're really dealing with primitive data structures like strings, lists, sets sort of as in hashes, Redis is actually used by some very large internet properties you may be familiar with like Twitter, Stack Overflow, Craigslist, for example all use Redis to manage their large enterprises in a very fast and efficient manner. So, the Aristotle library apps project it's really designed as loosely coupled single page HTML5 apps that are initially targeting mobile and user interface, interfaces and then the idea is that these can be progressively enhanced depending on the capacity of the querying a client. So, you're desktop, you're going to have more things than let's say if you're trying to access these apps on your smartphone. Aristotle library apps is based on Django. It also provides an API interface to the Redis datastore. We-- right now we have limited support for the SOLR text indexer. And we also have some interfaces to our digital repository that's based on Fedora Commons. The BIBFRAME Redis datastore, which is the other open source project is a collection of configuration and server site scripts for running Redis as a bibliographic datastore. On this link there's links here to both projects on GitHub, so all the source code is available there. So, right now I've created and running on this server is our, is a demo-- demonstration Redis BIBFRAME datastore and so here's kind of distribution of keys with this nice little pie chart. So right now we're working on two variants of the Redis Library Services platform. The really most basic is just-- it just uses one Redis instance. It's really the quickest method for launching a working Redis Library Services platform and it basically scales to the limit of your machine's memory or RAM. There's a couple of links there, one is a link to the demo I mentioned. We also have a sort of production, we'll definitely put then in quotes at the discovery dot colorado college dot edu forward slash apps. There we go. So, how are BIBFRAME entities represented in Redis? So, these entities are really just a collection of Redis key patterns and it's pretty much by convention how we construct those key patterns and really all the basic, the core different entities start with a Redis hash and then we add additional status structures to meet a particular need that we're trying to solve. So, that's kind of a very different way of approaching a data mulling problem where if you have a relational database you may create all the schema and all this front end work. Well, in Redis we don't have to do that, which makes it really flexible so right here there-- these are some examples if you click on I don't know can't really see that. But, anyway, these commands run-- are running against a live Redis datastore. So, if you click on any of these maybe I can get another one. Yea and see it doesn't really look great but let me just try and do this. There we go. So, this is pretty raw if you've looked at it, but these are all how Redis is storing that particular key. So, in this case the key is "bf colon instance colon 1" and then if you do that it just comes back with all of these. So, there's some different keys that as you can see that uses some different data structures there. One thing that's nice about Redis is-- and our approach for taking the Redis Library Services platform is that we can really easily integrate other vocabularies and schemas within the same BIBFRAME structure, very much as Eric was talking about the profiles. This is an example of a live system that uses an equivalent of those profiles to and then you'll see some examples there. So, what we've done in this demo is I've ingested three types of bibliographic records. The first are our MARC records and this is actually where we spent the most time. I don't think I'm going to launch the animation, but if you go to the website you can kind of see this animation that will show you sort of the workflow that we use. But, anyway, our MODS records, so in our digital repository Fedora Commons all of our digital objects have MODS metadata associated with them. So, I-- we have an example here. We're using XPath to do that transformation. Now I'm going to bring up an example of one of these from our Helen Hunt Jackson oral history collection that's in our digital archives and is-- well you can see some examples. So, right here this is a very brief example of a Redis BIBFRAME instance and you can see, well we're not really capturing a lot. We're working, so our development process is something-- we're using something called the linked startup model, which basically means that it's a lot of iterative agile development to continually improve this. So, we wanted to start with a very basic sort of minimal viable record if you will and then will build and add additional information to that. So, let me go back here. So, right here if you go you can see this is sort of-- this is the live demo site. Right here these featured items a couple of these come from the digital archives. Right here so this is an example, the record that we ingested from our MARC record and then what I've done I've created like a little web service that goes out, extracts and harvests information from the open library dot org like the cover image here. And so you can see this is sort of this matchup. Also if you notice, I don't know whether you can see it or not, on the left here of all these different fields towards the bottom it says RDA colon carrier type. So, this is an example of RDA that's within this BIBFRAME instance that's associated with-- in Redis. That's a Redis, as part of that Redis collection. So, oops let me go back here. So, the final collection I wanted to work on, so here's an example. There's our large digital archive site if you want to see sort of the source. The third or source of records I wanted to work with are the Project Gutenberg's e-books, which are over 44,000. And so I took-- Project Gutenberg provides RDF and uses some double concept, kind of a custom RDF vocabulary and I kind of used that to ingest into this demo. Here's an example of Robert Louis Stevenson's "Treasure Island." And see, here's the work view and if I go in, here's the instance view. And there if I click on that that takes me straight to the text version of "Treasure Island" on Project Gutenberg. So, continuing on so I'm going to talk real briefly about the future of the Redis Library Services platform. We want to increase the engagement with our users. Again, this is a minimal vital product and sort of what we want to start doing is doing some AB testing on all of our user interfaces. So, the interface to the BIBFRAME catalog we want to test each of the user interface elements and see which one our users prefer. Enhance you saw a little bit of an example of that, of the records view I showed of the open library extraction. So, that's one example of how we're enhancing this record. That's just the really tip of the iceberg. All-- OCLC, Library of Congress, Wikipedia and Open Library all have these rich sets of bibliographic information that we're really interested in sort of mashing up into our Redis Library Services platform. We also want to embrace, we want to-- the idea is that we embrace existing vocabularies and ontologies and those are really driven by actual metrics and desires of the platform's users. So this includes I would say closely tracking the BIBFRAME vocabulary as it develops and matures. And we'll-- hopefully we'll keep up with that. The four fees that extend, so the reason I broke up the Redis Library Services into two different projects is that there's other open source and commercial systems out there that may be interested in using the BIBFRAME Redis datastore piece but are not as interested in the frontend app piece. So that way they can use the BIBFRAME datastore and continue to use this. And so finally, the plus E is experiment. And so I think the Redis Library Services platform is a very promising technological approach to really imagining the future of bibliographic information. And it also starts including some of these tools with a need to transition from a MARC 21 to this new BIBFRAME and other models. Finally, I'd like to thank my Director, Ivan Gaetz, for giving me the kind of the freedom to experiment on this. It's been pretty exciting, also the librarians and staff at Colorado College have been very patient with me as I show stuff that isn't really working all the time. And finally, George Machovec, who's the Executive Director of the Colorado Alliance of Research Libraries; he's been very supportive of my work and trying to apply the Redis Library Services platform to a consortium level. Thank you very much. Here's my contact information. I'd be very interested in getting your feedback. Please contact me and I know-- don't normally wear a suit and tie so this is actually a picture of me in my preferred uniform at Aikido summer camp last year. So, thank you. [applause] >> Vinod Chachra: My name is Vinod Chachra and I'm going to be talking about the BIBFRAME experimentation at VTLS. We felt that in order to implement BIBFRAME you have to have some amazing abilities, the abilities to support linked records, the ability to support BIBFRAME data model, which Eric defined so well and as a transition to BIBFRAME will not be instantaneous we need the ability to have MARC records and BIBFRAME records, meaning MARC records and XML records coexist in the same database at the same time so we can make a transition over time. And that's a practical consideration in doing this. We had to have the ability to display hierarchic records in the database like FRBR, not necessarily FRBR but like FRBR. And the navigation and visualization system that allows users to see the links between them and navigate among them and that's what I'll be covering today. The outline then is basic architecture BIBFRAME, which is already covered by Eric so I'll take the time to make a remark on Eric's presentation. He had very many wonderful, new and interesting analogies in his presentation. By contrast, I read a book review once that said the book is full of new and interesting things. Unfortunately what's new is not interesting and what is interesting is not new. [laughter] That is not the case with Eric's presentation. So, since he saved me that time I won't go through these diagrams which look like his, but we talk about the metadata management system that needs to exist so that we can handle both MARC records and XML records at the same time. And essentially the goals are to ingest and do duplicate control, have record conversion processes, have edit functions that allow you to edit MARC and XML records at the same time and they're linked together so they synchronize automatically, have storage media records for all formats. You have MARC formats somewhere and you have XML formats other places and they have to stay in the same database at the same time and be able to export these records, because if you do create BIBFRAME records and you have to deposit them to union catalog or somebody else's system you'll have to convert those and send them out. So, all these features have to be supported in the metadata management system, which we have finished developing already. I won't show you examples of that, but just the diagrams of how things work. On the left side we have XML records coming in a variety of formats. They go through a standard conversion format and create records for the production system. On the right side are all the export functions that you can export records anywhere you want and the idea here is you might have an ONIX record coming in going out as a CSV record and so forth. Of course, BIBFRAME belongs in there too. Then you have two editors that are required to edit your production system and edit the incoming records and the independently-- can be used independently and they then get synchronized amongst the records. And this is another diagram of the same thing that you have different source records coming in, distort consistently and you have editors that edit it. So, this is the infrastructure for the metadata management system that is required for records to coexist. Now we'll move on to BIBFRAME. How should we go about implementing BIBFRAME is the question that we had to face because when you're going to design and develop something you have to figure out exactly how to do that. Well, we had several options. VTLS has been working many years in linked data and we have already for implementations. Only two of them are shown here. The first one was FRBR. We did FRBR about eight or nine years ago and it hasn't really caught on very much. It's been used in several libraries but it hasn't caught on. And ISAD records, which is the international standard for archival data and that is being used by the National Library of Wales. So we have these two models to work from and build on as we build the BIBFRAME model. And we chose to the ISAD model and that really addresses the issue that Eric was talking about mapping squares into circles and the circles, the mapping of those squares into circles is much easier in ISAD than it is in FRBR and I'll show you why. FRBR is too rigid in that there are only four levels and items can only be attached to the lowest level and they cannot skip levels and sometimes you need to do that. ISAD on the other hand has unlimited number of levels and you can skip levels and it is a straight model like Eric showed on the right side of his diagram. So, we ended up choosing the ISAD model to implement this. And here's an example of a search that you do that shows records that are coming out and just to make it easy to see I've got right at the end in brackets it says BIB, Eric you can't see this, so let me blow it up. It says BIBFRAME work and then you go down so that people who are not accustomed to what a BIBFRAME work looks like can identify by these brackets that are put there to make it easier. If you click on one of the BIBFRAME works you get to see a hierarchic structure of the display, the BIBFRAME work, the BIBFRAME instances, the annotations and also the authorities. Those are the four key components of the BIBFRAME model and they get displayed out here. And then if you hover over an instance than it shows the date of instance and in this case it shows the book cover and all the publisher information and so forth. Similarly, another instance will be shown and on you go. Then you have an annotation attached to these records. So, if you go back here you see the annotation records, whoops. This thing has a mind of its own. It has annotation records and you can link to the annotation record and go to the body of the annotation whether it's an internal annotation or external annotation, because it has a full URL you can just do either one with the same simplicity of the other. And then there's this visualization capability I was talking about. In the bottom left-hand side there's an indication that says view the browser and we hit-- when you click on that directly from your online catalog you see the visualizer on the right and I'm going to take a risk and see if I can connect it to the database so you can see that it really works and see what happens. Oh, there you are. So, we have this database. It's got MARC records and it's got BIBFRAME records in it. Okay, so I get that. I click on Lord of the Rings and I get the thing I just showed you, but I want to go to the visualizer so, I'll go here to the visual browser. I click that and there it is. So you can see the visual browser. Now when you're browsing it's a true browser. I can look at all of the records that are here that are instance records and then you look down here we have the authorities. So, let me click on this before I do that. There's the author there and the publisher is here. So, let me click on this authority [inaudible] and you see all the works that are attached to that particular authority and you visualize it and there's a connection back to the mod. So you can do it either way, whatever's convenient for you. You can visualize things as you see them, moving back and forth and this is all in operation. If you want to see it yourself you can go to that website I showed you or stop by at our booth and you can see it there. So, with that I'll return to the PowerPoint presentation and wait, there. So, that's where we were. Let's move on. In order to do this we had to create links and so you saw the link going from fantasy on to the fiction there and I've already shown you this particular navigation. So, the real question you have to answer is that I gave you an example that have five or six links, but the web is much, much bigger than that. So, the question you have to ask is how do you navigate linked data when you have a big cloud like this? The answer is you have to break it up into parts. We've already done a live system for the Kansas City Public Library where we've taken the linked data and broken it up into parts and here's an example of that. This is a linked data example of Civil War information in Kansas City. And you can see the links on the left and then the one's on the right and the way these links are set up, but what you can't read is there are 41 different relationships and it's automatically broken up into three parts and you can navigate these parts by going previous and more going right to left and see all of the components in chunks, in small chunks and that's how you end up navigating the full web. We call it the linked data navigator and this is from the Kansas City Public Library's website. They define what a relationship browser is and you can go to that when they open it. They haven't quite opened the site yet. It'll be open on 21st of August to the public. It's right now they're showing it to the funding agencies and the hot shots that made this happen. This is a diagram of change in perspective of the linking of the data, but there's another point that we have to make and the point is something called reification. We've talked about triple straws and triple straws have the relationship you see an object relationship with another object or subject relationship object, whichever way you want to say it. But there's another concept that has to be brought in. The concept that says, how do I know that the relationship is true, and that's the process of reification. And this thing that you see in triangle justifies that relationship by the support of an artifact. So, now in this model you can attach artifacts to the node on the left to the node on the right to the link between them and then in addition you can have another link that says the link is justified by an artifact and this is the process of reification. For example, you might say that Charles Dow here, Dow was killed by, not a great example, was killed by Franklin Coleman and there's statement by Wilson that justifies the fact that that occurred and that's the reification process. There are other examples of that in the database. And I'm going to talk about where the future is as far as OPACs are concerned and what I'm going to say is pretty obvious, but occasionally it pays to repeat the obvious. So Eric said earlier that what we're doing with BIBFRAME has to be woven into the fabric of the internet were his exact words. So, there's the fabric of the internet and we will wave and weave the OPAC in the fabric of the internet. We can see where your new OPAC will exist in the internet and how we'll have to transcend that and we have to part of the open cloud. We are based on linked data. We have to have no keyboards in the new OPAC. It has to be voice activated, no mouse, mobile handheld devices, visual and you will not know where your OPAC ends and where the rest of the world begins. If you've done that, you've done this thing right. You can come to our website and look at it operating. Thank you very much. [applause] >> Sally Hart McCallum: Thank you Vinod and our last presentation will be from Jean Godby from OCLC Research. She's going to describe the analysis that she and some of her colleagues have carried out looking at BIBFRAME and schema dot org which is the emerging search engine format used the search engine suppliers. >> Jean Godby: We've heard in many presentations here and in previous conferences that there are lots and lots of good reasons why the library community needs to be paying attention to schema dot org, which is the ontology that was developed by the major search engines to enable their-- materials contributed to them or available out on the web to be indexable and visible to those search engines. And so essentially what schema dot org does is tell us that if you want your work to be discoverable then conform to these standards. And when we first noticed this at OCLC we thought well this is great because we were independently trying to figure out what those elements would be by picking and choosing from individual published standards that were available. So, schema dot org saved us a ton of work and so in addition to having this advice and in a sense a mandate from schema dot org to work in this way we also knew that if we saw inefficiencies or gaps or inconsistencies we would be able to engage with them at an appropriate level in order to get those deficiencies addressed and so that's another rationale for working with them. So, as this work unfolded it turned out that there was a rather complicated context and so in this slide I'll just give you a very small window into the standards making sausage machine. And it's kind of a timeline but actually these events overlap a little bit. So, first of all, OCLC became involved in the work with schema dot org. We jumped right onto it and actually it was with the help of Eric Miller who came to us and told us, you need to be paying attention to this for many more reasons than I listed on that first slide. And so our initial release of linked data for WorldCat dot org happened quickly after that. And that's available on WorldCat dot org. If you go to a record you can scroll all the way down to the bottom and you'll see linked data. So, it's linked data markup for some of the elements in a record in WorldCat dot org. And actually the elements that are being linked up as linked data usually are in controlled MARC access points and they are links out to authority files. And so it's a way to connect up all of the authority file conversion for linked data that's been happening at OCLC and Library of Congress and many other places. So, that's one thing. We also realized as we were doing this work we wanted to create a strategy for moving forward and I heard in other presentations here and elsewhere that because this is a rapidly evolving world we need to be thinking in terms of an agile development. So, our initial release was a release that had some proposed vocabulary extensions for things that we didn't feel like schema dot org was handling very well. For example, the proliferation of physical formats that librarians have to worry about was represented only sketchily on schema dot org so we proposed that. So, a few months after that the BIBFRAME early experimenters group was convened and that seemed to us to have some impact on the work that we had been doing and we had to try to figure out how they were related. So, were they responding to the same impulse as us? Were we trying to solve some of the same problems? Could we figure out a way to have a complimentary relationship? It was truly not clear and so when we started looking at, for example, some of the vocabulary extensions for-- that we had proposed in the initial linked data release we realized that there were some exact duplicates. And so we wanted to get rid of some of that duplication. We also realized that schema dot org has its high level concept, which is the creative work and that seemed to have a relationship to the high level concepts of BIBFRAME and we needed to understand that relationship better. So, that was the role of our, the role that we played in our participation in the early experimenters group. And the paper that I've just released tries to describe that. Now in the meantime, partly as a result of having our participation in the BIBFRAME group we realized that we didn't really need to be in the business of developing vocabulary anymore because the library community would do that. And so we sort of reorganized in a way and realized that what we need to do is still participate with the schema dot org people and do the experimentation of trying to describe library resources using only schema dot org and nothing else and trying to figure out how far we can push that and only then would we approach schema dot org with proposals for extending vocabulary. And that was the role that the Schema Bib Extend group does, which is facilitated by one of our colleagues at OCLC, but it is a W3C community group which has, which includes many, many people in the community. So, their goal is to discuss the experience with using schema dot org and then to propose vocabulary that will make life easier and then go through the formal voting process of getting consensus and then to make those recommendations to schema dot org for full incorporation into their ontology. So, in addition to that OCLC, in their agile development, is realizing that they need to be doing experiments with these vocabularies and consolidating them into a model that will show these ideas. And given that we also are working with the early experiments we want to take this evolving schema dot org work and relate that to what is happening in BIBFRAME and so that's the topic of the paper that I've described. So, what is the big conclusion? The big conclusion is that I think we managed to successfully get away from the initial confusion about the schema dot org markup that's on WorldCat dot org now and to come up with a very simple diagram that describes how we relate. So, on the one hand you have schema dot org and their position to appeal to whoever has stuff out on the web that needs to be described, so it can be library stuff. It can be those same resources that are described by publishers or by anybody else who has a commercial or a personal interest in anything that is going to be about works of intellectual endeavor of some sort. So, they have to be in other words broad but kind of shallow, so they have broader data coverage than our community would have, but their model is very, very shallow and it will need to be enhanced. BIBFRAME on the other hand is the contribution of a particular constellation of communities of interest who are interested in curating these materials as well as facilitating discovery of them. And so they're interested in modeling detail that will be deep but not necessarily as broad as what schema dot org has proposed. Now, in a simple configuration we have these triangles that sort of overlap, but what is in the middle? And in the analysis that we presented we realized that we're seeing lots of terms that are very close in meaning to each other and in some cases they're identical. They're things like person, organization, you know those kinds of very high level concepts are common to the two schemes and then some elaborations that might be kind of obvious. So in RDF parlance it would be properties of those, so it would be subjects and publishers and dates and things like that. And if you look at a head to head comparison of the records that have been generated you can kind of get the superficial impression that the depth of coverage there in that common core resembles the type of coverage that you would get in a Dublin Core terms records, maybe slightly more complex but that's the level of the overlap. And in our view, that's kind of a correct thing, because we obviously do have these common concepts but ultimately we have different purposes. Now as we worked through our analysis we realized that there were some big issues and in talking to people at this conference, these issues are, I've discovered that these issues are shared by a lot of people who are involved in similar kinds of analysis. So, first of all what we had to do with schema dot org is try to understand how to represent FRBR there. And this was a point of discussion in the schema bib extend community group is also a point of discussion in a great amount of detail at OCLC in using the results of schema bib extend. So, there's a lot of activity going around this because we feel like it's fundamental to the world view that we're trying to present and it also unlocks the-- all of the richness of RDA as well. So, once we have the FRBR concepts than RDA enables us to wire up various relationships to those concepts and we want to be able to get that. And yet, schema dot org does not recognize FRBR for various reasons that they've articulated. The basic reason is that they believe that the outside world doesn't understand it and so they don't have a justification to impose that on the outside world and we have to make that case. The second thing is that some of the high level BIBFRAME concepts are things that we have been able to model in a different way in our experiments and so there's a lot of discussion happening around how the modeling of BIBFRAME and authorities and annotations works out in the details. And then the final thing is that we feel like we've done a lot of work in doing implementation and the implementation that we have using our agile methodology means that we can rapidly turn around a modeling idea into all of WorldCat dot org and so we're modeling and testing at scale and we want to be able to do that. But, increasingly we're beginning to believe that we need to start focusing on drilling down rather than working at this high level that we've done so far. And so we're anxious to do that and as I listen to Eric's presentation I realize that he just gave us a bunch of work to do, which is that he talked about using or reconciling BIBFRAME with VRA and, in fact, we've done some work locally at OCLC where we're looking at VRA and other aspects of digital objects and trying to represent that in schema dot org. So, when-- in the spirit of agile development we know what we can focus on in our next stage. So, the-- I'll just conclude with a more policy kind of thing. So, not only is there technical development going on but that translates into a list of tasks that are essentially complementary. So, the people who are working with schema bib ex, our community at OCLC as well as the schema bib extend community, we are tasked with evaluating schema dot org as a candidate for doing a high level library description, description of library resources for the purpose of discovery. BIBFRAME would be in the position of leading the discussion on how to harmonize BIBFRAME with other standards that have been developed in the library community. So, a second thing is that schema bib ex would negotiate with schema dot org to adopt changes that we think we have to have in order to make our life easier. At the same time there will be collaboration among the groups associated with the development of BIBFRAME to work with libraries to take on that very difficult task of expanding the descriptive capacity to cover the details that are actually in the MARC records that the actual communities need. In addition, we know that we'll have some concepts that are common and so the schema bib ex people and OCLC have taken on the task of trying to understand how BIBFRAME concepts map to schema dot org and in an analogous way the BIBFRAME community is working on how to map MARC concepts to BIBFRAME. So, these are all complimentary tasks that result in a model that will be rich for both of us, because we want to be able to have our stuff discoverable and we also want to have the detail that we're used to and the standards that we're all familiar with. So, in the end I'll leave you with some references where you can learn more about this. The first one is a description of our rationale for using schema dot org. The second one is this paper that I've just released and then the third one is a link out to the schema bib extend community group should you be interested in getting involved in that. And that's a great group so I hope that you consider joining. [applause] >> This has been a presentation of the Library of Congress.