Patrick Egan is a scholar and musician from Ireland, who served as a Kluge Fellow in Digital Studies at the Kluge Center. He has recently earned his PhD in digital humanities with ethnomusicology in at University College Cork. Patrick’s interests over the past number of years have focused on ways to creatively use descriptive data in order to re-imagine how research is conducted with archival collections.
I arrived at the Library of Congress in January 2019 with a number of ideas, some knowledge of the work carried out by LC Labs, and a general understanding of the expertise of some of the Library staff. As an ethnomusicologist with a background in IT and recently completed PhD studies in Digital Humanities, I wanted to broaden my research experience by interacting with a number of staff who were experts in the field of Linked Data, the Library’s audio collections and others who were working on similar projects. Focusing on process over product, I planned to delve more deeply into the possibilities afforded by engaging with experts, to explore the potential afforded by collaboration. This post showcases first contact, discovery, interactions and the journey through which I developed my project, Connections in Sound.
Before this project began, it was already known to me that the Library’s Technical Specialist Matt Miller had worked on the now well known Linked Jazz project. Developed since 2011, it uses Linked Data for creating a network between Jazz musicians who referenced each other in oral history recordings (that are held by a number of American universities). I understood that the workflow of Linked Jazz involved transcribing interview material, identifying the names of historical Jazz musicians that were mentioned – such as influences, collaborators – and making a host of interesting connections between them. This data was then used to visualize this fascinating networked graph of connections between musicians. It continues to aid researchers and the public in understanding the interconnected world of Jazz.
From the start, I saw potential for getting in touch with experts like Matt to explore ways that they as specialists might provide crucial knowledge, experience and maybe suggest the potential for working with data from Irish traditional music that resided within the American Folklife Center (AFC). I also wanted to connect not just performers, but recordings. I saw capabilities such as bringing a number of versions of a song or tune title together through digital processes such as data gathering, structuring or digital visualisation.
To begin, in the month of January I surveyed the audio collections of Irish traditional music at the AFC, seeking out various recordings music, song and dance and paying close attention to the types of data that lay within the collections. The 37 total collections I identified contained a wealth of performances spanning the twentieth century, a range of formats and in particular “live” concert recordings where performers described musical pieces that they played in detail.
If you are interested in viewing these collections in detail, check out this publicly available spreadsheet of my findings: https://docs.google.com/spreadsheets/d/1VJrq_38vnmzBopT0yLYuR9ZwIxKdWW4ssJrLA6LYPnw/edit?usp=sharing
As I began to explore the audio material within these collections, I thought about ways to experiment with connecting the titles of recordings, unite the information describing them and bring together previously unlisted and unidentified tunes that were scattered across the collections into one dataset.
After surveying the AFC collections with Dr Todd Harvey and staff at the American Folklife Center in January, I was able to bring a dataset together that listed a number of songs, tunes and dances. From this initial dataset, I was in a position to begin a dialogue with Matt Miller. In early February, I got in touch and sent a sample dataset:
Patrick: Matt, I have developed some datasets that had concert recordings, fieldwork interviews, and fieldwork notes. I want to see how we can connect the performer stories about the tunes and songs with the recordings that were in other collections, both at the Library and elsewhere. As it turns out, the type of material in these recordings is rich in contextual detail; think of performers introducing musical pieces to audiences and ethnographers before they were played.
Matt responded with:
Matt: It depends on what you want the final dataset to do. Having URIs for each song might not be necessary for you at this stage of the process and you could provide your own URIs for each song as proof of concept. Being able to link out to artists, other datasets enables the ability to pull in data from those external [resources] and leverage it in new ways.
As it turned out, most material at the AFC did not already have URIs, but I could see that a large amount of URIs for the material I was discovering already existed on the World Wide Web (especially on the website www.musicbrainz.org ). The question was, URIs aside, how could I make links between the audio at the Folklife Center and the material that was on the Web?
I also wanted to explore other resources (in particular digital resources) that were available to me at the Library. As I was now resident at the Kluge Center, I could easily reach out to staff from the AFC to talk about possibilities to make this happen. A productive conversation with head of the AFC Archive Nicki Saylor and archvist Kelly Revak revealed that the American Folklore Society’s Ethnographic Thesaurus (AFS ET) was a useful resource. The AFS ET in its digital version provides descriptions of musical instruments, of musical pieces, performers and a number of other helpful ethnographic descriptions. The AFS ET also encourages researchers and developers to suggest improvements to its growing Thesaurus.
The example above shows that descriptions that connect audio material already existed at the Library and that it had potential to be helpful to Connections in Sound in some ways. For example the stories about musical pieces that were recalled by performers at concerts could be included because the AFS Thesaurus provides a structure to link transcriptions of these stories to musical pieces.
Discovering Library Resources
As meetings and email correspondances progressed, it became apparent that not only was it possible to link stories to musical pieces using the Thesaurus, but also that the Library’s Linked Data Service (id.loc.gov) already had URIs for well known performers within Irish traditional music that were beginning to be uncovered within the collections. In this case, the URIs were mostly relating to performers who had recorded commercially. The first example we found was the famous Irish “uilleann piper” or bagpiper, Willie Clancy (Fig. 2). Clancy’s music was recorded during an important period of the revival of Irish traditional music – the 1960s and 1970s. Having a URI of Clancy meant that I could link the recordings from the Library to other important documents about his life across the Web.
The coming months were focused on building and expanding upon one dataset, named “Items” – a listing of musical pieces from up to seventeen of the thirty seven collections that had been identified at the AFC. In order to demonstrate the possibilities for my ideas to experts at the Library, I needed a substantial amount of entries that could be united, and visualised. By May 2019, and with the help of interns and staff, the Items dataset had grown exponentially. This dataset, seen below, had by then included up to 1500 items and a great number of detailed descriptions, or notes, about musical pieces.
Once I had conveyed to Matt Miller that my dataset had reached a substantial amount of detail, it was then possible to start thinking about ways to visualise connections between items in the dataset and out across the Web. Matt advised that a Python script be written in order to generate the connections for each item in my dataset. In the world of Linked Data, the connections between items are referred to as “triples”. A triple is a descriptive connection between digital material that can be read by a computer. It nominates that one item has a relation to another. This allows a number of relations to be made between mutually related items, hence there is unlimited potential - thousands of connections can be made between audio items when digital items are properly described. Communicating with Matt about this was exciting, as all sorts of possibilities were discussed.
Patrick: That would be great, I’d like to see how a python script would work. I think I have all that’s needed for triples to be generated.
- Subjects – individual tune instances (not sets or medleys)
- Predicates – played by (performer), played on (instrument), played with (tune), also appears on (album), performed on (date), performed at (location), also known as (other tune name)
- Object – performer, instrument, tune, album, date, location, other tune name
A very helpful part of this process was a document shared by Matt entitled “Items Mapping”(Fig 5). Each column that was deemed useful and usable for Linked Data purposes was mapped onto a possible predicate (relationship to an object) and a corresponding “Object URI”. The resulting Exchange allowed us to define these descriptions and to examine their use and potential.
Collaborating in Python
By mid-June, Matt had created this initial Python script that showed how to include dependencies for setting up triples and outputting data. This script was very useful, and included comments that allowed me to understand the work needed to continue development of the script to suit the data that was being used in the project.
from rdflib import Graph, Namespace, URIRef, Literal
from rdflib.namespace import RDFS, RDF, SKOS
from rdflib.serializer import Serializer
schema = Namespace(“https://schema.org/”)
dcterms = Namespace(“http://purl.org/dc/terms/”)
itmi_ext = Namespace(“https://itma.ie/litmus/instruments-ext#”)
itmi_core = Namespace(“https://itma.ie/litmus/instruments-core#”)
itmi_tunes = Namespace(“https://itma.ie/litmus/tune-types#”)
#make a new graph
g = Graph()
with open(‘items.csv’, encoding=”utf8″) as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
# what is the main URI we are going to use?
# this is just using an example from irishtune and the id number
#tune_uri = URIRef(“http://www.irishtune.info/EXAMPLE/” + row[‘id’])
# TODO map to real urls
# or we can use the LCCN URIs you made?
tune_uri = URIRef(row[‘URI3’])
# maybe connect it back to the irushtune, needs the IDs here
# this is just faking it
itunes = URIRef(“http://www.irishtune.info/EXAMPLE/” + row[‘id’])
#make it a close, exact…? match
g.add( (tune_uri, SKOS.exactMatch, itunes) )
# Media Note
if (row[‘Media Note’] != ”):
g.add( (tune_uri, schema.position, Literal(row[‘Media Note’])) )
if (row[‘OriginalFormat’] != ”):
g.add( (tune_uri, dcterms.medium, URIRef(row[‘OriginalFormat’])) )
if (row[‘On Website’] != ”):
g.add( (tune_uri, schema.url, URIRef(row[‘On Website’])) )
if (row[‘Collection Ref’] != ”):
g.add( (tune_uri, schema.identifier, Literal(row[‘Collection Ref’])) )
# todo remove space in collection ” Collection ”
if (row[‘ Collection ‘] != ”):
g.add( (tune_uri, schema.holdingArchive, URIRef(row[‘ Collection ‘])) )
if (row[‘Outside Link’] != ” and “http” in row[‘Outside Link’]):
g.add( (tune_uri, schema.url, URIRef(row[‘Outside Link’])) )
# make a label for it
if (row[‘Item’] != ”):
g.add( (tune_uri, RDFS.label, Literal(row[‘Item’])) )
# TODO needs URIS
# Make a fake one for now
if (row[‘Performer’] != ”):
performer = URIRef(“http://www.EXAMPLE.com/people/” + row[‘Performer’].replace(‘ ‘,’_’).replace(‘”‘,”).replace(‘ ‘,”))
g.add( (tune_uri, dcterms.contributor, performer) )
# TODO make sure the value in this field matches up to the itmi_tunes vocab
if (row[‘Type of item’] != ”):
g.add( (tune_uri, RDF.type, URIRef(“https://itma.ie/litmus/tune-types#” + row[‘Type of item’].replace(‘”‘,”).replace(‘ ‘,”)) ) )
if (row[‘ Position ‘] != ”):
g.add( (tune_uri, schema.clipNumber, Literal(row[‘ Position ‘])) )
# TODO needs to be URIs
if (row[‘Location’] != ”):
g.add( (tune_uri, dcterms.spatial, Literal(row[‘Location’])) )
# todo… MORE fields
# write it out as n-triples
g.serialize(destination=’triples.json’, format=’json-ld’, indent=)
In late June, I developed this process.py script and when ready to try it out, I visualised the results using an open source SPARQL visualizer from developer Mads Holten (https://github.com/MadsHolten/sparql-visualizer). I used the resultant triples.nt file to see the results. To make it easier to view I created a subset of the “items” dataset and generated a handful of triples that can be viewed here:
I modified it as another way to view one of the most frequently performed tunes from the collections:
The ability to collaborate with Matt in this way allowed me to control how the triples were generated, to add what I needed and to re-work the Python script as the work progressed. Understanding this process was crucial to being able to customise the Linked Data graph, to visualise the results, and to begin to prepare the results for connecting with data on the Web.
Reaching out across the Web
The next step in connecting with data from the Web was to explore web resources that already had IDs for audio recordings listed in the Items dataset. I contacted Matt:
I removed the “clip” label from process.py momentarily and pointed the label marker for “item” towards the “Tune Cleanup” column. I am thinking next about the URIs for items as you had commented in the process.py script – that a web scrape for IDs on irishtune.info might be helpful, or if not maybe the site owner has an API service for his site. Very excited to think about where we can go with this now!
By early July, I had progressed to be able to link the project with the website, IrishTune.info. This website was extremely useful as its dataset was meticulously developed by its author, Alan Ng and proved accurate results when queried. I wrote a Python script to webscrape IDs from the IrishTune.info website, and worked closely with an intern to reconcile the IDs that were found with the tune titles that were matched in the Items dataset.
Another resource that proved helpful was the LITMUS (Linked Irish Traditional Music) project at the Irish Traditional Music Archive (ITMA) in Dublin, Ireland. Archivist / researcher Lynnsey Weissenberger at the ITMA had developed an (ontology) that described Irish traditional music in some detail, which could also be tied in with our developments at the Library of Congress. By linking descriptions from the Items dataset with the LITUS ontology, I could make the project interoperable with projects that might use this data in future. More on the LITMUS project can be found here: https://www.itma.ie/litmus/info .
Possibilities and process
Ultimately, by collaborating with a range of experts, I was able to more deeply engage with the process of creating Linked Data, to more fully understand research and development that was being carried out in a number of institutions located in different continents, whilst applying the latest techniques to data within the Library of Congress. By liaising with the ITMA, other types of descriptions were added to the triples that were developed. However, there are other possibilities worthy of further exploration, such as ways of associating stories that were announced in recordings with musical pieces, most prevalent in collections such as the Philadelphia Céilí Group.
As a result of wider collaboration, the potential for this project went way beyond the original remit. The names of individual performers needed to be split from each other in group performances, GeoNames became easily identifiable, composer names were also available as URIs. The possibilities became endless.
From the beginning, it was obvious that developing an experimental approach to a highly technical digital project at the Library included a steep learning curve, especially so when experts reside in different areas of the Library. In addition to that, research and development in the area of Linked Data with Irish traditional music is in its nascent form.
By reaching out to experts from a wide range of disciplines and expertise, it was possible to harness knowledge and widen the scope of the project in ways that moved the research beyond the limitations of working independently. Careful and consistent communication widened the possibilities for insight and understandings to emerge that would remain unattainable when experimenting within a narrower research agenda.
 *URIs, or Uniform Resource Identifiers, are web addresses (just like www.mywebsite.com/audiofile) that allow audio material to be formally identified and located on the World Wide Web.