CENDI PRINCIPALS AND ALTERNATES MEETING
Library of Congress
Washington, DC
October 29, 2008

Final Minutes

THE FUTURE OF BIBLIOGRAPHIC CONTROL AND METADATA

On the Record: Report of the Library of Congress Working Group on the Future of Bibliographic Control
Search, Present and Future: Implications for Metadata

Improving Metadata – Strategies and Standards
The FGDC Geospatial Metadata Standard and Other Initiatives
Metadata for Scientific Data: Integrating DOE/DDE, STTR and ICSTI Initiatives

 Welcome

Ms. Herbst, CENDI Chair, opened the meeting at 9:10 am.  She thanked the Library of Congress for hosting the meeting. Roberta Shaffer, Principal from the LOC, welcomed the attendees and invited them to visit the Library’s exhibits of the draft of the Declaration of Independence and the recreation of the Jefferson Library. 

THE FUTURE OF BIBLIOGRAPHIC CONTROL AND METADATA

 “On the Record: Report of the Library of Congress Working Group on the Future of Bibliographic Control”
Deanna Marcum, Associate Librarian for Library Services, Library of Congress (presentation, .pdf format)

As part of an overall library look to the future, the Library of Congress chartered a working group on the Future of Bibliographic Control. The goal was to reduce processing costs, help to frame the role of the Library of Congress with regard to bibliographic control in the future, and to bring the Library into the digital era. As part of the reduction in cost issue, the Working Group looked at all the elements of bibliographic control. They decided to use this project as a way to think about the future.

The working group included representatives from American Library Association (ALA), the Association of Research Libraries (ARL), Special Libraries Association (SLA), Medical Library Association (MLA), the American Association of Law Libraries (AALL), and the Program for Cooperative Cataloging (PCC, which is focused on cooperative cataloging). They also invited Google and Microsoft. The group included key thinkers in the library and information science area, including Cliff Lynch (Coalition for Networked Information), Lorcan Dempsey (OCLC), and Jose-Marie Griffiths (University of North Carolina).

The group looked at bibliographic control in general and in practice. The goal was to recommend collective ways of thinking about bibliographic control and what The Library should do. Early on, the group realized that it wasn’t such a black and white world. At the end of the first full day, the group decided to hold three regional meetings with varying topics and to introduce a web site to collect comments. The themes of the three meetings were users and uses of bibliographic data, which was held at Google; structures and standards, which was held at ALA in Chicago; and economics and the organization of bibliographic data, which was held at The Library.

The group worked without Library staff present and issued an interim report. The report was briefed to all staff in November 2007. The final report was received by Dr. Marcum in January 2008.

The possible audiences ranged from policy makers to the library community and the people they serve. It was important to get the tone right.

Three overarching principles were identified. There is a need to redefine bibliographic control. What is the definition of bibliographic control when there are diverse users, venues and library materials? It became important to view bibliographic control as a distributed activity. In terms of redefining the bibliographic universe, there is a need to interact with the commercial sector and with libraries to take advantage of the data that is already available. It became clear that the role of the Library will change to where it is no longer the sole source of bibliographic control. Collaborations, partnerships, and standards will be important.

There were 108 recommendations at varying levels of detail. However, the report is not an implementation plan, but a call to action. It is up to the Library to say how it would implement the recommendations. Taking advantage of the momentum and to validate the assumptions, Dr. Marcum has asked all groups within The Library to consider how they could contribute to the new environment. The Final Report has received many comments; the majority of them are positive. The most encouraging thing was that the group worked together so well. The 108 recommendations were achieved without dissenting views.

The next step is to continue the conversations. A number of initiatives have emerged based on the findings. The Council on Library and Information Resources (CLIR), with a grant from the Mellon Foundation, has a major project to catalog and announce special hidden collections. The Associations for Library Collections & Technical Services (ALCTS) Program for Cooperative Cataloging and a number of other institutions have used the Report as the basis for discussing their own futures. Dr. Marcum believes that it is important for The Library to implement these changes from a position of strength rather than waiting for the changes to be dictated. Rechanneling of resources will allow money to be invested in new ways of doing things before they are forced to do it.

They are already in the process of making a major organization change, merging acquisitions and cataloging. This involves approximately 900 people. It is a good way to get to the future quickly by asking questions about how jobs should be configured in the future and how to coordinate metadata creation across the organization.

Three groups at The Library provided feedback on the report. Dr. Marcum received the comments on May 1, 2008, and responded to all 108 recommendations. This was done in order to brief ALA at its annual meeting.

The most controversial recommendation was that work on resource description and access (RDA) should be suspended. The Library has asked the national libraries to jointly consider RDA. The RDA effort is international, so it isn’t possible to stop it. However, the libraries need to test and decide if it will do what they want it to do. There are approximately 20 test organizations and the test methodology is published.

Special Collections are of key importance because, historically, they have not received full bibliographic control. The Library’s focus has always been on monographs, so they could serve as copy catalogers for others. This caused the special collections with over 100 million combined items to be less well cataloged. The working group believes that these should be mediated collections with a need to get the items into The Library’s catalog. A special task force is thinking about how to do this without the full MARC record being required. A major portion will be added by the beginning of next year, now that mechanisms are in place for Music, Rare Books and the Asian Collections.

Rick Lugg from The Library is surveying the bibliographic landscape including libraries in the US of all sizes. How do they acquire and create metadata? How many are buying bibliographic records? How many rely on the Library? What is OCLC’s role? What are the dependencies in the current network?

The thoughts moving forward are also being informed by projects at The Library, such as the FLICKR Project to better catalog Prints and Photographs from the Work Projects Administration (WPA) Era. In the last months, eight million people have looked at the collection and added tags. The Report on this project will be issued in the next few weeks. How should The Library deal with the diversity of individual tags from the public? They will likely find themselves more in the role of editor than creator.

Regina Reynolds and Bruce Knarr will be leading the implementation team. In general, The Library agreed with the working group report, but the implementation team is being charged to make specific recommendations and establish priorities. A report will be provided at ALA’s Midwinter meeting, and there will be an article in the Information Bulletin.

The group discussed the desire for CENDI to be involved in these discussions. It was suggested that Rick could work with CENDI to make sure the information centers have been included. Dr. Marcum was very interested in this since The Library needs to find ways to treat other libraries and information centers as true partners.

Action Item: Dr. Marcum and the Secretariat will follow up on how CENDI can be more involved in the Future of Metadata and RDA discussions.  Rick Lugg might be a point of contact.

 

“Search, Present and Future: Implications for Metadata”
Dr. Carl Randall, Project Officer, Directorate of Information Science and Technology, Defense Technical Information Center (presentation, .pdf format)

Dr. Randall began the “Implications for Metadata” study approximately 18 months ago. The primary focus was on search methodologies, but there are implications for indexing and metadata cataloging. The survey consisted of five sets of questions focused on full text searching of unstructured data versus searching structured metadata. He received 48 responses from 29 organizations. Nine of the CENDI agencies participated. There were fewer responses from university libraries than from the other groups.

The majority preferred to use a combination of both techniques, doing a quick search on full text using Google or some Internet search engine, and then using that information to do more specific metadata searching. Only a few respondents preferred full text search only.

The limitations of full text that respondents cited included less relevancy, precision and control, including the inability to differentiate meanings. Relevance ranking of documents may not be a true indication because it is difficult to tell how relevancy ranking algorithms are working. Respondents thought that improvements are needed to the algorithms.

The limitations of metadata searching included the high cost of its creation and maintenance; the need for more machine-aided tools; the fact that some searchers may lack the knowledge of the rules used to create the metadata; and the learning curve required. The majority believe that metadata produces more consistency and higher quality search results. Metadata helps to narrow and refine; it is critical for some special collections, especially those that are not rich in text.

What improvements are needed to search? The ability to searching multiple formats equally is needed as is query expansion and the application of metadata to all records. Semantic searching is seen as a longer term improvement, along with the ability to address different learning styles among users. 

Dr. Randall queried the participants about the future of search and he also reviewed the literature on this topic. He distinguished between how we think search engines will be and how we would like them to be. He noted that future information seekers would rather find than search. They expect simplicity. The future will likely include more interaction between the system and the user. There will be more images, graphics, visualization and interactive multimedia. The latter is already occurring in warfare and medicine.

In the future, systems will capture the behavior patterns of users to make results more personalized. Search engines will be more universal (they will be consumer driven and providers will increase access). User expectations will continue to increase. As data is shared, there will be more options for searching. The television and the computer will become one and the same devise. There will be more powerful tools incorporating data mining techniques. Search will become more fine-tuned and able to look for trends and anomalies. Vertical searching and specialization will become the norm.

What will be the impact on metadata? It is interesting to note that the information science literature de-emphasizes metadata. However, embedded metadata may be behind the scenes supporting the outcome in more powerful ways. 

Dr. Randall then read briefly from a recent article entitled, “Why is an I-Pod Doomed?” The answer is that we will get music as an application and data store in the “cloud”. MySpace and Microsoft are already starting to do this. This change will have implications for the metadata for discovery.

Dr. Randall is working on a new study regarding digital preservation, including digitization. He hopes that when the survey is announced that CENDI members will once again participate.


“Improving Metadata – Strategies and Standards”
Todd Carpenter, Managing Director, National Information Standards Organization (presentation, .pdf format)

NISO has restructured it leadership committees to more than triple the capacity for dealing with standards and related issues. Four new groups have been created. In addition, NISO has implemented collaboration tools that should help to ensure a two- to three-year standards development cycle. NISO has also assumed the secretariat for TC46 and for its subgroup on identifiers, which gives NISO a more active role in the international standards arena.

Mr. Carpenter listed more than ten metadata standards. How do we rationalize them? He pointed out that despite standards, there can be conformance issues as the standards are being used. For example, he used the NLM Document Type Definition (DTD) for PubMed as an example where conformance varies as it is used outside the specific purpose of loading content to PubMed. The Dublin Core was intended to be a simple standard, but now there are nearly 100 pages under the Dublin Core homepage.

This is also complicated by a Supply Chain of Metadata.  It includes a number of organizations and roles from aggregators to libraries to abstracting and indexing services and search engines. The publisher must transform and redistribute the information in different ways. Various streams in the flow use different standards. For example, ONIX metadata is used between publishers and booksellers. Libraries use different standards, but they could use some of the information from other parts of the supply chain, such as ONIX. Libraries need to speed the creation of metadata, enhance the sharing of existing metadata, repurpose and reuse metadata from existing streams in the supply chain, create crosswalks and translations among the different formats, and introduce conformance measurement and assessment in order to continue ensure quality.

Mr. Carpenter highlighted three ongoing projects looking at these issues. The Book Industry Study Group’s (BISG) Product Data Certification Program is looking at conformance to the ONIX family of standards. It is based on prepared Product Metadata Best Practices, which is a set of voluntary guidelines. The Program, which is managed by the BISAC ( Book Industry Standards and Communications) Metadata Committee, provides feedback on the timeliness and quality of supplied data content. Currently, only five publishers have passed certification.

OCLC’s project is looking to transform ONIX to MARC. ONIX data would be pulled from the publisher supply chain, converted from ONIX to MARC, and then enhanced with OCLC information. The results would be provided back to the publishers and to libraries.

The NISO/UK Serials Group’s Knowledge Base And Related Tools (KBART) project is looking to enhance and improve the streams of OpenURL metadata. The publisher must be linked to a resolver supplier. Many people don’t have a clear understanding of the OpenURL format and what are the critical components of it.

In addition to these ongoing initiatives, there are several forthcoming activities. NISO is sponsoring four thought leader meetings funded by the Mellon Foundation to identify a path forward. The meeting on digital libraries and digital collections resulted in 12 recommendations, including that NISO should design a suite of conformance tools to help assess how well metadata conforms to the standard. The question is what is in it for the publishers? NISO will commission a study (Association of American Publishers [AAP], OCLC, and BISAC) to determine the return on investment for publishers. 

NISO will begin holding monthly open teleconferences in January. This would allow CENDI and its member agencies to have a closer relationship with NISO.

Action Item: Secretariat will follow up with Mr. Carpenter regarding the date, time and logistics for the open teleconference.

 

“The FGDC Geospatial Metadata Standard and Other Initiatives”
Norman Andersen, Metadata Standards Officer, National Geospatial Intelligence Agency (presentation, .pdf format)

Beginning with the Federal Geographic Data Committee’s metadata work 10-15 years ago, the International Standards Organization (ISO) developed a 19115 standard for geospatial metadata. The US led this effort, and Mr. Andersen was the editor. The first draft of the ISO standard took Version 1 (v.1) of the FGDC as its basis. ISO 19115 is about 80-100 percent compatible with the FGDC. 

In the meantime, the US was developing v.2, which made the ISO work less compatible. ISO 19115 embraced several other standards, which led to issues such as the representation of dates and times.

This resulted in an MOU (Memorandum of Understanding) with Canada to develop a North American Profile. This is in the final process at the InterNational Committee for Information Technology Standards ( INCITS).  The North American Profile will replace the FGDC standard.  There are many similar elements and several new ones. There are some implications for format. Dublin Core has been used as a crosswalk, because many metadata standards already cover these seminal elements at some level.

The question being raised is how agencies should prepare for this change. If you have legacy data, Mr. Andersen advises that you keep working with v.2. Adding one or more ISO Topic Categories to the existing FGDC metadata records as Theme_Keywords is also advised. Agencies should monitor the FGDC web site for news, materials and training opportunities. Conformance tools are being developed, and the FGDC has a team that is funded to do training for agencies at no cost. Managers working in this area should also educate and inform their agencies that a migration will be needed in the future. A metadata conversion plan using crosswalks and conversion software should be developed. When v.3 is finalized, it should be adopted.

Mr. Andersen also discussed the work of the Metadata Focus Group of the Geospatial Intelligence Standards Working Group (GEOINT) (www.gwg.nga.mil). The GEOINT Standards Working Group was initiated in January 2005. There are 26 core, or voting members, and some associate members. Many CENDI agencies are represented on the list. The Metadata Focus Group is one of eight focus groups. (Mr. Andersen provided their descriptions and points of contact for each.)

The Metadata Focus Group has produced several documents including the National System for Geospatial-Intelligence (NSG) Geospatial Metadata Profile for Discovery and Retrieval. This is currently based on Dublin Core, but Mr. Andersen would be interested in following up on the bibliographic metadata issues that are being presented at this CENDI meeting.

Metadata standards that are developed are driven by several existing standards within the international, DoD, and intelligence communities (IC). Some of these are mandatory and others are voluntary. The NSG standards are registered with the Defense Information Standards Registry, a registry of DoD metadata standards. Registration is required in DoD contracts.

There are a wide variety of metadata efforts in the IC and related communities. The Intelligence Community Information Sharing Group has eight focus groups. The metadata group has over 300 people on its list. A meeting was held on October 14, 2008, with 75 people in attendance and 20 online. They have quite a bit of participation and consensus. Cinematographers from Hollywood are involved in the motion images focus group. There are a series of organizations that work on harmonization of the various metadata standards.

The New Foundation Standard is a profile of search and retrieval from ISO 19115. A minimum core of metadata is being reviewed for all geospatial products and these are required. There are groups that also have other core standards that serve as a minimum. Given the number of standards and metadata activities, a desk side reference or maybe a FAQ-type guide is needed. Mr. Andersen’s remaining slides provide detail of the differences between the FGDC and the ISO standard, the status of US implementation, and more detail about the other initiatives and standards in the geospatial area.

 

“Metadata for Scientific Data: Integrating DOE/DDE, STTR and ICSTI Initiatives”
Franciel Azpurua-Linares, Program Manager, Information Management and Technology Consulting, Information International Associates, Inc. (presentation, .pdf format)

Since 1945, the US has spent approximately $4.2 trillion dollars on research and development (R&D). The return on investment is in the use of use and reuse of the knowledge and information that is generated. The advancement of information technologies has transformed the scientific landscape. It has resulted in a explosive volume of data and resulted in digital data that is fragile and not always accessible. Increasingly, the products of science and the starting point for new research is “born digital”. The exploding volumes and rising demand for data are driven by the rapid pace of technical innovations. All sectors of society are stakeholders in digital data management and access.

 This situation raises many challenges. Scientific data sets are distributed in many data repositories. Internet search engines can help but the results are generally uneven and unreliable. URLs alone are not reliable locators for electronic objects. Data retrieved from these databases is generally very accurate and authoritative, but these databases cannot be crawled easily by search engines and are generally not well represented in search engine results. Digital data collections are increasingly important as a primary mechanism for science output. They are a powerful force for inclusion, removing barriers to participation for many. These challenges also provide opportunities.

The management by DOE/OSTI of text resources is well established but the same cannot be said of non-text data. They are not always linked to the documents and the URLs are not always stable. In the first initiative, OSTI has begun to identify data centers from DOE-sponsored research. The OSTI DOE Data Explorer (DDE) is focused on cataloging these data centers and their data collections. They are implementing the ability to search and browse. Citations link out to over 240 collections that are fully or partially funded by DOE.

The goal with the DOE Data Explorer is to federate and identify one virtual metadata scheme. These efforts will be expanded to provide a tool that supports more automated and streamlined processes for creating and maintaining the data repositories including mechanisms for annotating datasets with relevant metadata.

The second initiative is a Small Business Technology Transfer (STTR) between OSTI and Information International Associates, Inc. (IIa). The goal is to build a generic STI ontology in Phase I that will serve as the building block for the Phase II development. A prototype system will result that uses Digital Object Identifiers (DOIs) to link data sets in the Data Explorer to technical reports in the OSTI collection. Work will be done on a process for assigning DOIs to numeric datasets. The basic STI Ontology and concepts for federated search would be integrated.  The last project on an STI Ontology is providing recommendations on how to move the DDE forward. They are in a two-year Phase Two of the STTR to create a prototype system using DOIs for data referenced in grey literature. The testbed will be the OSTI collection and the DDE. The effort will also take the prototype and integrate the ontology and the concepts for federated searching.

The third project is being undertaken with the International Council for Scientific and Technical Information (ICSTI). This work will analyze current limitations in data search and citation, demonstrate the state-of-the-art and practice of data citation that is already underway, demonstrate the integration of numeric data sets through digital text information, and recommend actions for the scientific, publishing and search communities to improve access to numeric data. ICSTI’s work is being led by the German National library (TIB) in Germany, which is already a non-commercial DOI registration agency. The DOI is already associated and can move with the metadata and communicate with other objects.

OSTI and ICSTI have already been working on federated searching. They are interested in a single metadata framework.

The prototype is scheduled to be demonstrated in June 2009 at the ICSTI meeting in Canada. The integration of the ontology and federated search is expected by June 2010.

These are three projects addressing the management of scientific data. There are many others in specific disciplines. The challenge is to bring the community together around common issues. The integration of these projects is a start. Additional input and cooperation is welcome.

 

CENDI Meritorious Service Award 

Ms. Herbst presented the Meritorious Service Award to John Sykes. Mr. Sykes was nominated by Paul Ryan for his two terms of service as Deputy Chair of CENDI. Mike Pendleton from EPA nominated Mr. Sykes for his efforts to promote CENDI within EPA by developing a CENDI space on the Science Connector portal. Mr. Sykes thanked the group. He indicated the benefits that he has received from being involved in CENDI for the past six years and he hopes to continue his involvement in other capacities.

Peter Young, Christina Dunn and George Roncaglia were recognized as members of the award committee.