Cool Tools Session

You can now view and download the slide deck from our “Cool Tools” lightning session on November 17, 2015. Many thanks to Brianna Marshall for planning the session and compiling the slides, and to all of our presenters: Alper Sarikaya, Mattie Burkert, Bronwen Masemann, Leah Misemer, David Harrisville, Christina Koch, and Brianna Marshall.

Download the PDF here or view the slides on Speaker Deck.

Digital Archives and Preservation II

Digital Preservation and Archives II: Institutional and National Preservation Efforts

11/18 Meeting Notes

This meeting was led by Peter Gorman, who is Assistant Director for Digital Library and Preservation Strategy for the General Library System and the Libraries’ Digital Preservation Officer. He studied Linguistics and Scandinavian Studies in his undergrad (U. Michigan) and has an MS in Library and Information Science from Drexel University. He came to Wisconsin from the University of Pennsylvania, and set up the UW Libraries’ first Web server in 1994. Since then, he has been responsible for digital library architecture and implementation, and more recently in digital preservation planning.

Our discussion focused on preservation, and Peter opened with a consideration of how libraries have been dealing with these issues for both analog and digital holdings. He talked about some similarities between analog and digital preservation, including how some analog practices carry over into digital practices, such as reformatting.

While some things are similar, Peter also discussed how digital preservation is different. He highlighted that digital preservation is a field being invented as it happens and that this creates uncertainty about whether the methods being invented now will be able to preserve something for 100 years. He believes that this has resulted in an over designing of systems because the risks aren’t understood, and he discussed the various risk models that are taken into consideration for preservation, such as natural disaster, format obsolescence, security risks, and institutional risks, such as workflow loss.

We then turned to addressing the question of what is UW preserving. Peter believes that in the past, it was easier to make decisions about what was preserved largely because the first value judgement on scholarship was made by publishers, and often commercial publishers. This gave libraries an easy way to think about what should be preserved because they only needed to save what publishers decided to print. With more self publishing and changes in the scholarly landscape, the question of what should be preserved is more in question.

Additionally, Peter raised the issue of how journals are being preserved, especially now that journal content isn’t being bought anymore because libraries only lease access to it. How will journals be preserved when libraries don’t and can’t always get the rights to the content? Peter suggests that because of this, we can’t even guarantee that we preserve all of the printed content anymore.

After considering how preservationists are approaching forms with large data sets, such as email correspondence, we talked about how policy will continue to play a role in archiving. Danielle’s experience working in data has led her to encounter strict regulations from the IRB that don’t allow for any information to be stored that could link data sets back to the original user, and this is an interesting way that policy is shaping preservation efforts. Peter agrees that policy will absolutely continue to shape archives, even though it is being shown that it’s almost impossible to really anonymize data, and libraries are still figuring our whose responsibility it is to do anonymization.

We turned back to question of how other universities play a role in UW’s preservation efforts. Peter sees that a lot of work is being done together, largely driven by a space crunch. UW is a founding member of HathiTrust and a member of the Digital Preservation Network, and these efforts help reduce the number of duplicates that need to be preserved, allowing libraries to focus on what is unique or rare in their holdings. This work is also about analog preservation, with institutions working out cooperative storage, such as trying to get one print copy of everything on JSTOR shared between many universities. UW is setting up their own institutional repository, but Peter believes that they may end up joining a larger one in the future.

We discussed what is being saved in UW’s repository and what the routes for getting into it are. Peter says the content is varied, including scanned images of texts, images, audio, theses, offprints, and more. However, they tend not to take on more interactive works, such as Flash animations, because the more interactive, the harder it is to keep the item alive. Because of this, they tend to focus on saving the raw assets, and a consultation with the library is the beginning for figuring out the best means for preserving access to research in the repository.

This discussion created some concern about the types of things not being saved, such as new media scholarship. Peter elaborated on the difficulties of saving new media scholarship, especially the challenges of preserving something that doesn’t have a migration path to a community standard form. Ultimately, there are two overarching strategies for this type of preservation: migration and emulation. Migration takes the content and migrates it into its successor, making sure the intellectual content survives. Emulation aims to recreate the functionality of the original even though the original hardware or software isn’t preserved. Peter suggested that this gets harder every year as more forms grow obsolete. Emulation has been done, such as for the Commodore 64 and Atari systems, but libraries don’t have the resources for this work, and instead, libraries are focused on migration. Bert also noted that the Library of Congress isn’t doing emulation, either. Although they’re documenting what it would take, they aren’t taking the steps to actually do it. However, he notes that plenty of people are, such as the Internet Archive, and that David Rosenthal has recently argued that emulation isn’t something we’re going to get away from.

Molly pointed out that this problem is terrifying for working in digital humanities because our own history might be evaporating. This early state of interactive digital history is in a potentially dire state of preservation, especially since we have many things that are being lost only a few years after being created. Peter suggests that this is why thinking about format is so important for researchers. We have to go with the best bet to have the greatest chances at survival. Researchers should be thinking about the community standards for saving their work, and this is something libraries can help with. Peter also emphasized the importance of developing community standards, such as what has been done with TEI. This is a good example of a community working together to standardize, and digital humanities needs a discipline-wide recognition that these standards should be built in anticipation of the long term sustainability.

We then discussed how these changes in modes of scholarship are affecting disciplines. Peter expressed his concern that as old models of publishing begin to break down and fewer monographs are published, some departments are deciding to produce less scholars. He is concerned that some are so strongly tied to a particular model of scholarly communication that they may be willing to let the future of a field deteriorate as well. Although there will always be a need for long form scholarship, does it have to be tied to a print form? What do we do with this trend? How do we preserve the larger enterprise of scholarship?

Mark suggested that e-books may be a solutions for this and also expressed concern that the old model of publishing has always been an outsourcing of evaluation of faculty to publishers. One solution, then, is to trust our faculty. Another is to accept that the print scholarly monograph may not be there in a decade. We returned to the question of what to do without the print monograph when some people say they won’t hire a candidate without one. We also considered how university economics are changing, and that changes in print publications are a symptom of this. Not only do we need a new economically viable model for publishing, but we also need one for the university itself. This led back to a discussion of how to afford preservation efforts. Although digital publishing can be affordable, digital preservation is not cheap, partially because it is so uncertain.

We ended by discussing how UW is dividing its resources between analog and digital preservation efforts. Peter said that the analog budget has been going down over time. There has been less focus on book preservation and more on the library’s unique holdings, largely because of the inter-institutional efforts discussed earlier. However, we don’t have this same sort of collaborative registry for digital objects, partially because citation isn’t as sophisticated, so it can be challenging to compare resources.

We also began a collaborative Google Doc for collecting resources on preservation: https://docs.google.com/document/d/1Vvcn8hiGpYouwF_rigZC7SAC5DzWmUBHRQYFD-SJCkQ/edit?usp=sharing

In the next session on December 2, we will be joined by Brianna Marshall, who will discuss best practices for our individual research efforts.

Digital Archives & Preservation: Theoretical Perspectives

The following is from Mattie Burkert, who wrote up her notes from our session on Digital Archives & Preservation: Theoretical Perspectives. We’ll follow up with two more sessions on archives & preservation:

11/18    Digital Archives and Preservation II: Institutional and National Preservation Efforts, led by Peter Gorman, Head of UW’s Digital Collections

12/4     Digital Archives and Preservation III: Data Management Best Practices, led by Brianna Marshall, Digital Curation Coordinator for UW-Madison Libraries

NOTES FROM 11/4 MEETING

We began by trying to disambiguate terms from the readings and figure out which ones were most useful, which ones meant different things in different disciplines, etc.: archive (classical, digitized, born-digital, dynarchives / archives in motion); formal vs forensic materiality; storage vs memory; digital object; medium vs format; media archaeology; digital forensics; curation; preservation; sustainability.

We talked about how broadly speaking, librarians may have more of a focus on access while archivists may have more of a focus on survival, but went on to challenge that binary (without the possibility of access, does an object really survive?)

We talked about some of the specific digital objects some members of the group are working with (webcomics, Twitter data, film, radio) and the challenges those present for access from the research side. Bert from the Library of Congress and Amy from the Digital Public Library of America offered their perspective from the archival side — in particular, how the nature of archiving is changing as new media are being collected, andhow  it’s becoming a more fluid, responsive, participatory activity. For example, if users request for Twitter to take down a Tweet from a year ago, the LoC Twitter archive also has to be updated to reflect that.

This discussion of archiving social data led us to a discussion about the shifting power dynamics and the economic structures of social data archiving. We agreed that archives have always been shaped by political and economic forces, but that perhaps the archiving of social media data accelerates or exaggerates that process, or makes it more visible. This then led to a broader discussion of how digitized and born-digital archives are similar/different to more traditional archives, and to what extent existing structures/practices are portable vs. to what extent the fundamental nature of archives is changing.

We discussed some of the ethical issues of archiving new kinds of data, like disc images of writers’ computers, or social media content. These questions raise some privacy concerns. Also, the definitions of how things enter the public record are based on an older model of political discourse — if you sent an angry email to a politician, it was archived, but today, when it enters a digital archive, it may eventually be picked up by search engines and attached to your name in a much more public way. Caitlin, a researcher from African history, took the opposite position: she advocated for the highest possible level of access to as much info as possible, pointing out that much of what is invisible in the archives she studies is invisible because it has been erased.

We talked about how there are private groups accessing public records and making them much easier to find than was the case in the past, which raised all sorts of questions about the changing role of digital archives in relation to other groups, corporations, and forces working to preserve and make accessible various materials. The possibility was raised that maybe an entity like the LoC shouldn’t be introducing redundancy by creating its own Twitter archive, but rather, working with Twitter on a plan to act as repository if and when the company disappears. Jesse pointed out that there is a kind of loss of meaning or beauty when you archive something like Flickr — it creates a temporary set of relations among images, and when you preserve those images and/or their relations, you actually lose something about the experience of ephemerality that is part of the point.

This got us into a discussion of scale and sustainability. Some of the archivists in the room pointed out the impossibility of saving everything, and Molly pointed out that sustainability from the data perspective can be very unsustainable, environmentally. We talked about the architecture and logistics of data centers, their energy costs, the expense of redundancy and backup, etc. We also talked about how our behaviors are changing and we’re saving more just because we can — so even as capacity for storage is growing, so is our need for it.

A related thread during this part of the conversation was about format vs medium; we discussed how the experience of an object changes with the format (e.g. watching a movie on 35mm film vs digital video file), and asked whether it’s realistic for archives to preserve not only multiple formats, but also the technology to access those formats. Challenging if not impossible from a resources perspective. Laima, an independent artist/researcher, pointed to the relationship between archival desire and nostalgia and asked to what extent the function of saving, e.g. old video game consoles, phonographs, and the like is best left to the public/collectors. Someone recommended Lucas Hildebrand’s Inherent Vice as a good source for thinking about the relationship between meaning and format.

At the end, we previewed the next two meetings and asked the group to identify open questions or loose threads they would like to see picked up:

QUESTIONS FROM 11/4 MEETING FOR FUTURE DISCUSSION

  1. how do UW digital collections deal with questions of authenticity?
  2. how are digital cataloging and finding aids helping to enable connections between analog collections?
  3. what are some specific digital archiving practices you use?
  4. for people creating small digital collections (not part of larger archives), do you have advice about how much emphasis to put on getting rights and permissions for materials? do you have advice about collecting metadata for those kinds of collections? what about the politics of sharing potentially sensitive materials (e.g. Nigerian popular fiction that is not widely available)
  5. how is the university thinking about research data vs archive data? are there any overlaps?
  6. when objects exist in multiple formats, what digital formats is UW prioritizing for preservation?