The Smithsonian Transcription Center creates indexed, searchable text by means of crowdsourcing…or as Meghan Ferriter, project coordinator at the TC describes it, “harnessing the endless curiosity and goodwill of the public.” As of the end of the current fiscal year, 7,060 volunteers at the TC have transcribed 208,659 pages.
The scope, planning and execution of the TC’s work – the in-house coordination among the Smithsonian’s units and the external coordination of volunteers — is staggering to think about. The Smithsonian Institution is composed of 19 museums, archives, galleries and libraries; nine research centers; and a zoo. Fifteen of the Smithsonian units have collections in the TC, which is run by Ching-hsien Wang, Libraries and Archives System Support Branch manager with the Smithsonian Institution Office of the Chief Information Officer.
Ferriter said, “To manage a project of this scope, one must understand and troubleshoot the system and unit workflows as well as work with unit representatives as they select content and set objectives for their projects. Neither simply building a tool nor merely inviting participation is enough to sustain and grow a digital project, whatever the scale.”
The TC benefits from the Smithsonian’s online collections. Though individual units may have their own databases, they all link to a central repository, the Smithsonian’s “Enterprise Digital Asset Network,” or EDAN, which is searchable from the Smithsonian’s Collections Search Center. The TC leverages the capabilities of EDAN and builds on the foundation of data and collections-management systems supported by the the Office of the Chief Information Officer. In some cases, for example, a unit may have digitized a collection and the TC arranges for volunteers to add metadata.
Each unit has a different goal for its digital collections. The goal for one project might be to transcribe handwritten notes; the goal for another project might be to key in text from a scanned document. A project might call for geotagging or adding metadata from controlled vocabularies (pre-set tags, used to avoid ambiguities or sloppy mistakes). But the source for each TC project is always a collection of digital files that a volunteer can access online.
Sharing data across the Smithsonian’s back end is an impressive technological feat but it’s only half of this story. The other half is about the relationship between the TC and the volunteers. And the pivotal component that enables the two sides to engage effectively: trust.
The TC’s role at the Smithsonian is as an aggregator, making bulk data available for volunteers to process and directing the flow of volunteer-processed data to the main repository. So, more than just trafficking in data, the TC nurtures its relationships with volunteers by means of technical fail-safe resources and down-to-earth, sincere human engagement.
Ferriter shows her respect for the volunteers when she refers to them as “volunpeers.” Ferriter said, ” ‘Volunpeers’ indicates the ways unit administrators and Smithsonian staff experience the TC along with volunteers. ‘Volunpeers’ underscores the values articulated by volunteers describing their activities and personal goals on the TC, including to learn, to help and to give back to something bigger….Establishing a collaborative space that uses peer-review resources brings to the foreground what is being done together rather than exclusively highlighting what is being done by particular individuals.”
TC staff made a crucial discovery when they figured out that what motivated people to volunteer was a sincere desire to help. Wang said, “Volunteers feel privileged and take the responsibility seriously. And they like that the Smithsonian values what they do.”
Ferriter said, “Volunteers indicated they were seeking increased behind-the-scenes access as a reward for participating, rather than receiving discounts or merchandise from Smithsonian vendors.” So TC staff developed a close relationship with the volunteers and they remain in constant contact my means of social media.
“Communicating in an authentic way is central to my strategy,” Ferriter said. “Being authentic includes being vulnerable and expressing real enthusiasm. It also entails revealing my lack of knowledge while learning alongside volunteers. My strategy incorporates an inclusive attitude with the intent of shortening the distance of institutional authority and public positioning.”
Institutional authority — or the perception of institutional authority — can be a potential obstacle to finding volunteers. Wang said the Smithsonian — like other staid old institutions — was perceived several years ago to have an image problem. She said that research indicated, “People think it’s nothing but old white men scientists.” Wang and Ferriter do not suggest that the solution is for the TC to appear young and hip and “with it.” Rather the TC demonstrates its inclusiveness in a very real and sincere way: by reaching out to any and all volunteers and treating them with appreciation and respect.
Volunteers are always publicly credited for their work. They can download and review PDFs of what they’ve done once a project is finished. Ferriter said, “I advise Smithsonian staff members who want to be part of the Transcription Center, ‘You need to understand that there is a commitment that you’re making to participate in this project, which requires you to be involved with communicating with the public, to answer their questions, to tell them specific details about projects, to be prepared to provide a behind-the-scenes tour.”
Each project includes three steps: transcription, review and approval. One of the remarkable results of the TC/volunteer relationship is that the review process has become so thorough and consistently reliable, and volunteers behave so professionally and responsibly, there is often little change required during the approval phase. This trust in the reviewers — trust that the reviewers earn and deserve — saves a significant amount of staff time for the Smithsonian in the approval phase.
Another remarkable result of the volunteers’ dedication is that TC staff has found that their manual transcriptions are statistically far superior than OCR, which often tends to be “dirty” and requires additional time and labor to correct.
Ferriter said that as successful as the Transcription Center is, as evidenced by the amount of digital collections it has made keyword searchable, there remain further opportunities to look at the larger picture of inter-related data. “The story may be more than merely what is contained within the TC project,” Ferriter said. “There are opportunities to connect the project to its significance in history, science and other related SI and cultural heritage collections.”
When those opportunities arise, the volunpeers will no doubt help make the connections happen.