An Interview with Harriett Green, English and Digital Humanities Librarian and Assistant Professor of Library Administration at the University of Illinois at Urbana-Champaign and Matt Conner, Librarian at the University of California, Davis, and author of a newly released book, The New Library: Four Case Studies (ALA)
Editor’s Note: This interview is the first in a three-part series of interviews with Harriett Green, conducted by Matt Conner, about digital humanities.
Part 1: History and Culture of Digital Humanities
Matt: So, I’m very interested in the digital humanities, and the context I’m operating from is an impression that the sciences are much more active in new library developments and communications technology because apparently they initiated some of this after WWII. This is from my reading of library history where there was so much technology with the war that the sciences caught on. There were computers and automation. It was natural for the sciences to get involved, and today they are at the forefront with ebooks, for example. And physicists use their tool arXiv to publish directly online. My impression is that the humanities are really lagging and they’re very traditional.
Harriett: It’s a mix of factors. There are a number of people doing work in digital humanities that’s right on par with what people are doing in e-science and e-social science in terms of text mining, data mining, and mining their archive.
There’s a disciplinary orientation: With the sciences and the social sciences, the way they use their data, the way they process their data, the way they share their data and share research lends itself much more easily to automation, to publishing things online, to pushing out huge data sets. Whereas the way the humanities works, from literature to the performing arts, is that you have texts–you have performance–things that aren’t so reducible to data. So, the way that the humanities do research, and the way they delve into the text or the art or the image requires more visceral elements that you can’t reduce down to data.
And so with digital humanities, the challenge has been putting the materials that they use for research in a digital archive that is usable. You can scan texts, but how high is the quality of the scan? With images: can you get it in a high enough resolution to see all the aspects to be able to interpret the image? So, I think part of the challenge is getting the research archive into a form that they can use online and then producing tools to analyze the archive.
Again, I think for science and social science with the way they interpret, analyze and use their research, the tools that are out there lend themselves much more to that kind of work. The humanists, on the other hand, are trying to do close readings of the text, trying to find trends, or extract some new way that the author is looking at the text or the history.
What Is Digital Humanities?
Matt: So, when I was finishing up a degree in English in the late 90s, the big contribution of the new technology was to mount manuscripts so that medievalists and Early Modernists would no longer have to travel to examine them in person. They could see them online. They could get all the marginalia and the illustrations, as well as the actual text. That was a connection that was really clear to me. So are you in the business of scanning things to get them into databases now?
Harriett: Not as much: Google and commercial publishers are doing increasingly more digitization, and libraries are still digitizing their collections–we have our Digital Content Creation unit at Illinois that digitizes a lot of our holdings. But if you actually look at grants for the NEH or even IMLS, they won’t fund solely digitization. Perhaps when you were in school that was a research endeavor, where now it’s more like processing in the sense of cataloging. I work with faculty after all that work is done, who are doing actual research. So getting the OCR, the optical character recognition, scanning the text behind the text, and then analyzing it through text-mining or doing network analysis of the different people who are mentioned in the text or something like that.
The Products of Digital Humanities
Harriet: This has been going on for decades. In the early twentieth century you can find scholars who were doing word counting and word frequency by hand and then there was the application of tools since the 60s and 70s and just in the last 20 years, it’s really ramped up.
Matt: So, these tools need some online text to be applied. And you find these texts through Google?
Harriett: Yes or you can use a plain text file from your own library archive. There are also free text archives like Documenting the American South out of the University of North Carolina Libraries. So, there’s some open text archives as well. That’s part of the digital humanities as well: Not only taking collections and texts and digitizing them but also putting them into a form that people can actually use. With a text, you can highlight passages then copy and paste it into Voyant and do all sorts of analysis.
Matt: So, you’ve got word frequency. You’ve got various kinds of visual outputs. Are there other kinds of analysis? Or are those pretty much the methods that people use.
Harriett: That’s one method. So with the Center for New History and New Media [http://chnm.gmu.edu], they have a number of tools as well: Omeka is a visual archive that allows you to build exhibitions. TEI, the Textual Encoding Initiative? With metadata, you mark up information about an object, but with TEI you actually mark up the words and say this is a noun, this is a name, this is a place, that kind of thing. And so then people can datamine texts and say okay what are all the places in this text and then they’ll pull it out based on what’s marked up. Abbot is a tool out of Nebraska that automates the mark up of text into a standardized TEI schema.
Matt: So, people can decide what they want to mark-up based on their project.
Harriett: And you can mark-up manuscripts. The Text Encoding Initiative itself is a big part of the digital humanities, especially for literature and the marking-up of texts. For example, Civil War Washington has a database where you can find people that are in documents from the Civil War era in Washington, D.C. and you can look at maps. Another big part of digital humanities is maps: GIS, geo-location, layering maps with different information which they’ve done as well.
Matt: This text mark-up feature sounds metadata like, but it’s not. It’s built into the tool.
Harriett: With metadata you just apply it, whereas textual mark-up is actually much more of a research intensive process because you’re reading the text and deciding whether this is a quote or this is a person that I want to make sure comes out in the text. So, it’s almost a close reading of a sort when you do text encoding. And there are articles out there on this by leading scholars like Julia Flanders and Sid Bauman.