A big thank you to everyone who attended the HathiTrust event, both in-person and virtually! If you were unable to attend, we have a video of most of the presentation that will be posted shortly. Our turnout was excellent, with well over 20 attendees between those onsite and those connected online. John Wilkin and Sebastien Korner were informative speakers, shedding light on a multitude of topics, ranging from the origins of HathiTrust and the many facets of digital preservation to providing insight on the skills employers are seeking in recent graduates.
According to John, HathiTrust began about 20 years ago, shortly after JSTOR spun off. At the time, microfilm reformatting was the standard preservation method. However, digital was entering the scene and those involved in HathiTrust’s formation recognized its suitability for preservation, particularly in terms of enhancing accessibility. John acknowledged the initial reluctance to use digital as a preservation strategy due to it being perceived as transitory, as well as issues surrounding reproduction and sustainability. However, he assert that if properly approached from “a three legged stool of digital preservation,” digital preservation is viable.
John defined the three legs of the digital preservation stool as:
- Fidelity in capture: Items must be faithfully digitally captured.
- Openness of formats: Items should not be locked into a proprietary format, and formats should themselves be open with plentiful usage tools.
- Viability: Items should be stored on equipment with longevity and stability.
If a digital repository’s legs are strong on these three counts, it has the initial requirements for becoming a trusted repository. Indeed, creating a trustworthy digital repository involves an additional set of requirements including fixity checks, preservation metadata (e.g., PREMIS, additional administrative metadata), organizational financial viability, and robust storage.
Sebastien spoke at length regarding HathiTrust’s storage solution. During the initial search for a storage solution, HathiTrust was looking for a reasonably well-established company committed to digital preservation with an innovative roadmap whose storage product was scalable to meet the repository’s increasing needs. HathiTrust found these characteristics in Isilon, who produces the HathiTrust’s clustered NAS (network alliance storage) storage solution. The NAS is referred to as clustered because it is composed of nodes, each of which has both a CPU and storage which allows for simultaneous growth in brain power (CPI) while growing the storage size.
Another interesting feature of HathiTrust’s repository is it is comprised of two identical instances, one in Ann Arbor, and the other in Indiana University’s data center (Indiana University is one of HathiTrust’s partners). Sebastien noted that the main concern when running two instances is ensuring replication and consistency between the two. With all content active on both instances, users are served data from both. The only difference between the two is that content is ingested only in the Ann Arbor instance and is then replicated in the Indiana University instance. Having two identical instances of the HathiTrust repository is beneficial in numerous regards, particularly as a physical safety measure given the distance between the two institutions running the instances.
After the overview of digital preservation and the origins of HathiTrust, focus shifted to attendee questions, with the bulk relating to digital preservation and student skill sets. One attendee asked if digital is the last word on storage, or if another method will soon usurp its dominance. While both John and Sebastien indicated that presently spinning disk storage (e.g., SATA disks) is still fundamentally necessary for a trusted repository like HathiTrust, other newer formats like solid state storage are being developed and utilized in some repositories. However, as the HathiTrust representatives stated, the main issue with storage is cost.
Storage cost was again brought to the forefront when an attendee asked what the most pressing issues in digital preservation are today. John discussed the concerns surrounding big science data and storage cost. He said that although people often get hung up on formats and methodologies, the real problem is storage cost. Indeed, John posited that it is the inability to afford storage that drives the inability to narrow down a standard storage solution. He stated that one of the reasons HathiTrust is so costly (and so successful) is because it stores data well, utilizing mid- and high-level storage products. John and Sebastien also cited the amount of data generated by each person and determining whether the data worth spending the money to store as present digital preservation concerns.
From the presentation it was clear that working at HathiTrust or any digital preservation repository requires much on-the-job learning. Sebastien said that good methods to follow are utilizing common sense and following the organization’s mission. He stated that the concept of data is similar, regardless of what the data is or its format, and that by continuously evolving and learning new strategies and skills, one can be successful. These sentiments were echoed and expanded upon when an attendee asked about the skill sets HathiTrust looks for when hiring, particularly skill sets that new graduates should develop. Although John and Sebastien stressed that the necessary skills vary from position to position, they did cite several areas of focus:
- Creative problem solving and analytical skills
- Knowledge of digital standards and preservation strategies
- Knowledge of copyright and its application
Our sincerest thanks to John Wilkin and Sebastien Korner for taking the time to speak with us and sharing their HathiTrust and digital preservation knowledge. Also, thank you to Kim Schroeder for setting up this event.