Digital Preservation and Archives II: Institutional and National Preservation Efforts
11/18 Meeting Notes
This meeting was led by Peter Gorman, who is Assistant Director for Digital Library and Preservation Strategy for the General Library System and the Libraries’ Digital Preservation Officer. He studied Linguistics and Scandinavian Studies in his undergrad (U. Michigan) and has an MS in Library and Information Science from Drexel University. He came to Wisconsin from the University of Pennsylvania, and set up the UW Libraries’ first Web server in 1994. Since then, he has been responsible for digital library architecture and implementation, and more recently in digital preservation planning.
Our discussion focused on preservation, and Peter opened with a consideration of how libraries have been dealing with these issues for both analog and digital holdings. He talked about some similarities between analog and digital preservation, including how some analog practices carry over into digital practices, such as reformatting.
While some things are similar, Peter also discussed how digital preservation is different. He highlighted that digital preservation is a field being invented as it happens and that this creates uncertainty about whether the methods being invented now will be able to preserve something for 100 years. He believes that this has resulted in an over designing of systems because the risks aren’t understood, and he discussed the various risk models that are taken into consideration for preservation, such as natural disaster, format obsolescence, security risks, and institutional risks, such as workflow loss.
We then turned to addressing the question of what is UW preserving. Peter believes that in the past, it was easier to make decisions about what was preserved largely because the first value judgement on scholarship was made by publishers, and often commercial publishers. This gave libraries an easy way to think about what should be preserved because they only needed to save what publishers decided to print. With more self publishing and changes in the scholarly landscape, the question of what should be preserved is more in question.
Additionally, Peter raised the issue of how journals are being preserved, especially now that journal content isn’t being bought anymore because libraries only lease access to it. How will journals be preserved when libraries don’t and can’t always get the rights to the content? Peter suggests that because of this, we can’t even guarantee that we preserve all of the printed content anymore.
After considering how preservationists are approaching forms with large data sets, such as email correspondence, we talked about how policy will continue to play a role in archiving. Danielle’s experience working in data has led her to encounter strict regulations from the IRB that don’t allow for any information to be stored that could link data sets back to the original user, and this is an interesting way that policy is shaping preservation efforts. Peter agrees that policy will absolutely continue to shape archives, even though it is being shown that it’s almost impossible to really anonymize data, and libraries are still figuring our whose responsibility it is to do anonymization.
We turned back to question of how other universities play a role in UW’s preservation efforts. Peter sees that a lot of work is being done together, largely driven by a space crunch. UW is a founding member of HathiTrust and a member of the Digital Preservation Network, and these efforts help reduce the number of duplicates that need to be preserved, allowing libraries to focus on what is unique or rare in their holdings. This work is also about analog preservation, with institutions working out cooperative storage, such as trying to get one print copy of everything on JSTOR shared between many universities. UW is setting up their own institutional repository, but Peter believes that they may end up joining a larger one in the future.
We discussed what is being saved in UW’s repository and what the routes for getting into it are. Peter says the content is varied, including scanned images of texts, images, audio, theses, offprints, and more. However, they tend not to take on more interactive works, such as Flash animations, because the more interactive, the harder it is to keep the item alive. Because of this, they tend to focus on saving the raw assets, and a consultation with the library is the beginning for figuring out the best means for preserving access to research in the repository.
This discussion created some concern about the types of things not being saved, such as new media scholarship. Peter elaborated on the difficulties of saving new media scholarship, especially the challenges of preserving something that doesn’t have a migration path to a community standard form. Ultimately, there are two overarching strategies for this type of preservation: migration and emulation. Migration takes the content and migrates it into its successor, making sure the intellectual content survives. Emulation aims to recreate the functionality of the original even though the original hardware or software isn’t preserved. Peter suggested that this gets harder every year as more forms grow obsolete. Emulation has been done, such as for the Commodore 64 and Atari systems, but libraries don’t have the resources for this work, and instead, libraries are focused on migration. Bert also noted that the Library of Congress isn’t doing emulation, either. Although they’re documenting what it would take, they aren’t taking the steps to actually do it. However, he notes that plenty of people are, such as the Internet Archive, and that David Rosenthal has recently argued that emulation isn’t something we’re going to get away from.
Molly pointed out that this problem is terrifying for working in digital humanities because our own history might be evaporating. This early state of interactive digital history is in a potentially dire state of preservation, especially since we have many things that are being lost only a few years after being created. Peter suggests that this is why thinking about format is so important for researchers. We have to go with the best bet to have the greatest chances at survival. Researchers should be thinking about the community standards for saving their work, and this is something libraries can help with. Peter also emphasized the importance of developing community standards, such as what has been done with TEI. This is a good example of a community working together to standardize, and digital humanities needs a discipline-wide recognition that these standards should be built in anticipation of the long term sustainability.
We then discussed how these changes in modes of scholarship are affecting disciplines. Peter expressed his concern that as old models of publishing begin to break down and fewer monographs are published, some departments are deciding to produce less scholars. He is concerned that some are so strongly tied to a particular model of scholarly communication that they may be willing to let the future of a field deteriorate as well. Although there will always be a need for long form scholarship, does it have to be tied to a print form? What do we do with this trend? How do we preserve the larger enterprise of scholarship?
Mark suggested that e-books may be a solutions for this and also expressed concern that the old model of publishing has always been an outsourcing of evaluation of faculty to publishers. One solution, then, is to trust our faculty. Another is to accept that the print scholarly monograph may not be there in a decade. We returned to the question of what to do without the print monograph when some people say they won’t hire a candidate without one. We also considered how university economics are changing, and that changes in print publications are a symptom of this. Not only do we need a new economically viable model for publishing, but we also need one for the university itself. This led back to a discussion of how to afford preservation efforts. Although digital publishing can be affordable, digital preservation is not cheap, partially because it is so uncertain.
We ended by discussing how UW is dividing its resources between analog and digital preservation efforts. Peter said that the analog budget has been going down over time. There has been less focus on book preservation and more on the library’s unique holdings, largely because of the inter-institutional efforts discussed earlier. However, we don’t have this same sort of collaborative registry for digital objects, partially because citation isn’t as sophisticated, so it can be challenging to compare resources.
We also began a collaborative Google Doc for collecting resources on preservation: https://docs.google.com/document/d/1Vvcn8hiGpYouwF_rigZC7SAC5DzWmUBHRQYFD-SJCkQ/edit?usp=sharing
In the next session on December 2, we will be joined by Brianna Marshall, who will discuss best practices for our individual research efforts.