This post was written by Abigail Potter, Senior Innovation Specialist with LC Labs.
LC Labs has been exploring how to use emerging technologies to expand the use of digital materials since our launch in 2016. We quickly saw machine learning (ML), one branch of artificial intelligence (AI), as a potential way to provide more metadata and connections between collection items and users. Experiments and research have shown the risks and benefits of using AI in libraries, archives and museums (LAMs) are both significant yet still largely hypothetical. In short:
- Library collections are incredibly diverse and are particularly challenging for current machine learning and AI tools to process predictably.
- New AI tools with impressive claims are being released rapidly. We benefit from testing these tools openly, collaborating, and learning from others.
- AI-specific quality standards and policies that support our context of providing authoritative resources to the public over the long-term need to be developed and communicated to partners and vendors.
- While large-scale implementation of responsible AI in LAMs is still several years away, now is the time to increase experimentation and collaboration both within our organization and across the sector.
To account for these challenges and realities, LC Labs has been developing a planning framework to support the responsible exploration and potential adoption of AI at the Library. At a high level, the framework includes three planning phases: 1) Understand 2) Experiment and 3) Implement, each supports the evaluation of three elements of ML: 1) Data; 2) Models; and 3) People. We’ve developed a set of worksheets, questionnaires, and workshops to engage stakeholders and staff and identify priorities for future AI enhancements and services. The mechanisms, tools, collaborations, and artifacts together form the AI Planning Framework. Our hope in sharing the framework and associated tools in this initial version is to encourage others to try it out and to solicit additional feedback. We will continue updating and refining the framework as we learn more about the elements and phases of ML planning.
Elements of our framework will be familiar to people in LAMs, and in the federal sector. It incorporates the research and recommendations of Ryan Cordell, Elizabeth Lorang, Leen-Kiat Soh, Thomas Padilla, and Benjamin Charles Germain Lee. It was inspired by evaluation frameworks and guides such as the National Institute for Standards and Technology Trustworthy AI Framework, the Federal Agencies Digital Guidelines Initiative, and the National Digital Stewardship Association Levels of Digital Preservation, and refined through collaborations with members of the General Services Administration AI Community of Practice, NARA’s Office of Innovation, Smithsonian’s Data Lab, Virginia Tech University Libraries, and the AI4LAM network. The framework was initially articulated in conjunction with the Mellon-funded Computing Cultural Heritage in the Cloud initiative and preliminarily shared at the 2022 iPres Conference in Glasgow, Scotland.
Data, Models, People
In planning for and conducting AI and ML experiments at the Library of Congress, we’ve simplified ML processes into three main elements: Data, Models, and People. How these elements are combined helps us understand whether an application of this technology is useful, ethical and effective.
“Data” is pervasive in all aspects of ML. Data trains, is input, and is output from the model. Data contains the patterns (or labels) models recognize and predict, and data can validate if the predictions are correct. At the Library of Congress, our data is often collections data, historic copyright data, or legislative data. Library of Congress data includes all digital formats and is often available nowhere else. These real-world data sets were not created to be processed by AI. Their inherent messiness, imbalance, incompleteness, and historic content confuse models, resulting in undesirable or wrong pattern recognition. Published metrics about “state of the art” model or tool performance are often based on results from processing contemporary data, or well-known datasets in research settings.
“Model,” is shorthand for a complex set of technologies and tools that support the training, processing and predicting of the ML algorithm. The ability of the ML program to learn patterns from data without being explicitly told what to process distinguishes ML from other computer programs. How models are trained, what they are trained on, how data is processed and delivered to users or other systems all determine how well the model works for a specific task. Models to process speech and text, often called Natural Language Processing (NLP), have been developed by computer scientists for over 20 years. Advancements in the ways models are trained, the cloud architectures the models can process data on, and the availability of very large sets of data on the Internet for training are more recent developments. A growing number of vendors are offering AI services in which models are pre-trained, fine-tuned, and bundled into workflows that are proprietary to the service owner.
Despite ML being a technical and data-driven processes, people are essential to ML and entwined with the Data and Model elements. People create and are represented in data, and their privacy and other rights are protected by regulations and laws. People design and program AI tools to meet specific goals, some scientific and some commercial. Staff expertise and functions are represented in potential ML use cases. If training data doesn’t exist for a use case, people likely have to label datasets, adding another layer of management and perspective. ML systems can positively or negatively impact people. People and the organizations they represent are responsible for the quality of their AI systems. Ultimately, people determine when and how to implement AI and whether to do it responsibly.
Phases of AI Planning
Consideration of the data, models and people involved in an AI system is baked into our AI Planning Framework. The Understand, Experiment and Implement steps include collaborative activities and result in documentation that inform the development of practices and policies for responsible AI.
The first set of activities are to understand the proposed use of AI, and how the use case fits within a task, system or organization. Start by gathering different perspectives and thinking specifically about functions and services performed by your group. With particular use cases in mind, collaboratively articulate guiding principles; assess risks and benefits; map needs; priorities and expertise; and learn about data readiness. As we gathered AI use cases from across the Library, we benefited from getting very specific about the people, models and data involved in a use case. We developed the following tools and guidance to help groups refine use cases and understand their feasibility.
- Stated values, principles and policies for implementing AI inform complex decisions. Consult the U.S. White House proposed principles of using AI in government, they are a great starting point.
- We created a Use Case Assessment worksheet to introduce staff to the circumstances that can lower and heighten risk for a given use case. A second phase of assessment articulates risks and benefits for different groups and documents success criteria for a given use case.
- The lack of available training and evaluation data for models is a common barrier for AI use cases. LAM data is also inherently unbalanced which will affect the quality of AI outcomes. We developed a worksheet to understand and document data readiness for use in AI systems.
- A Domain Profile Workshop is in development to guide the grouping and mapping of several use cases to help clarify priorities according to levels of expertise, risk, or functions.
Resources and skills to support AI work are often not abundant in LAMs. Generally, the use cases with higher risk levels will require more resources and time to implement. Labs developed the Understand tools to help staff apply their expertise and assess where AI could be most productively applied. In use cases where benefits and risks are well understood and accounted for, data is available for training and evaluating models, and expertise and resources have been identified, the next step is determining or confirming an AI solution works.
New AI products and frameworks are being released frequently, each with their own mix of tools and claims. Experiments test specific use cases, models and data with staff and users to document performance and build quality baselines and benchmarks. It is a necessary step before implementation because published performance metrics, which can be in the 95-99% accuracy range for some Natural Language Processing (NLP) tasks, often cannot be achieved when processing LAM data. Quality baselines need to be established for most AI use cases in LAMs.
Baselines are created by testing and documenting a range of ML approaches that could support a use case and analyzing the output in detail. In addition to thorough performance testing, quality review processes must be established. AI output must be reviewed by staff and users to determine if it is good enough for use in LAMs. It should also be confirmed that the consequences of the automation continue to support organizational principles and goals. The process of testing specific use cases with staff and users will gather important feedback. It will also help to develop expertise around evaluating AI generated data. We use the following tools and mechanisms for experiments:
- The Digital Innovation Indefinite Delivery Indefinite Quantity (IDIQ) contract is a multi-year contracting mechanism that we can use to fulfill individual AI experiment at the Library of Congress, and includes requirements that may be valuable to the broader community.
- The Data Processing Plan documents data transformations and the predicted and actual AI model performance for specific tasks. It combines elements from a model card, data cover sheet and documents curatorial provenance. Vendors are required to fill it out as part of the Digital Innovation IDIQ.
- In Development: NLP vendor evaluation guide and quality review recommendations.
- Under Recommendation: Balanced datasets for benchmarking newly available AI models and tools.
Depending on the risk level, experiments can happen iteratively, cyclically, in a different order, or in different phases. The important step is to verify claimed AI performance or benefits can be realized.
There is great interest from staff, partners, decision-makers, stakeholders, and vendors to use AI to help make progress on entrenched challenges in managing and utilizing LAM data. However, like all emerging and potentially disruptive technologies, it isn’t practical to implement right away. The Library has responsibilities to its users, stakeholders, communities and tax payers. The high-level activities detailed in the Implementation phase would likely be more expansive in practice and different in other organizational contexts. In general, once an experiment is prioritized, deemed successful and feasible, and quality baselines are established, a responsible AI implementation would be supported by the following:
- An AI roadmap. Plans informed by evidence and insights from the experimentation phase to estimate and manage the costs, IT infrastructure, acquisitions vehicles, and continuous development cycles will be needed to support the implementation over time.
- Ongoing testing of AI models and outcomes. As an AI model processes more data, results change. Performance targets, quality measures and user experience goals must be established and monitored. Resources and plans are needed for auditing the models to be sure they are performing as expected over time. Feedback loops and fine tuning will likely be required to maintain desired levels.
- Outreach to staff and users. Add staff with machine learning expertise and support existing staff with training and experiences. Launch research and engagement programs with users to gather feedback on public facing AI tools.
- Continuous community engagement and collaboration. Coordinate support for developing shared quality standards for LAM use cases and data. Host shared LAM-specific datasets and tools for training, benchmarking, and quality review.
No single organization can navigate the changes and impacts of AI alone. LAMs have opportunities to develop the building blocks for an effective, useful, and ethical AI implementation. Co-developing and communicating our requirements as a sector will help improve AI outcomes in LAMs, as will sharing information about AI policy, governance, infrastructure and costs.
Our AI Planning Framework builds on the LAM sector’s deep experience in facing technological changes. As AI solutions become more ubiquitous, these steps will help to ensure that when AI technologies have caught up with their promise, we will be ready to benefit.
Please add your comments and feedback to this post or to the AI Planning GitHub Repo and stay tuned for further invitations to update and shape this Framework.