A Brave New Digital World: Archiving Digital Files, Part 2

This is the second blog post in a three part series about processing digital files. See the following link for Part 1 and Part 3.

The Duke University Medical Center Archives (DUMCA) recently began processing the files in our digital files backlog, which goes back to 2009. While the backlog was created in 2009, the files date from the mid-1990s to the present. These files are a mixture of born-digital (records created in a digital format) and digitized (records originally created on paper and converted into a digital format). The DUMCA is primarily working on ingesting born-digital records. 

In order for digital files to be preserved and made accessible, the DUMCA ingests these files into AXAEM’s Electronic Records Processing module*, our content management system (CMS). For more information on what a CMS is, see Part 1 in this series about processing digital files. Once the digital files are ingested and described, researchers can view these materials in the DUMCA’s reading room.

As the DUMCA is currently the only institution using the Electronic Records Processing module on AXAEM (it is very new), I had a large task ahead of me when it came to developing a workflow for appraising, ingesting, and describing the digital files in our repository’s backlog. I started this endeavor early in 2018 and quickly set about creating a workflow to ingest files from the digital files backlog. Traditionally, when materials are transferred into the Archives’ custody, they are given an accession number, which connects to the materials (and all information gathered at the time of accession) to a collection. The accession number allows archives staff to track when the documents were transferred to the Archives, who transferred them, and where the documents originated. This helps the Archives maintain control over the materials by providing documentation about where they were prior to being transferred to the Archives. 

Unlike analog collections, which arrive at the Archives in boxes that can be placed onto a shelf in our climate controlled stacks, digital files are transferred on different types of carriers. Upon arrival, that digital content must be transferred onto a server where the files are continually maintained and monitored, as electronic resources are much more unstable than their analog counterparts. Types of carriers include floppy disks, CDs, USB drives, hard drives, DVDs, memory cards, and zip disks. Bagger banner

Prior to ingesting digital files into AXAEM, I add technical and descriptive metadata to the files using the application Bagger. Bagger is an open-source program created by the Library of Congress that groups or “bags” files together with text files of descriptive metadata. Packaging data together like this means archivists can transfer the digital files to the user with more ease. Examples of the types of information included in descriptive metadata are as follows: who transferred the files, who created the files, accession number, receiving institution, description of the files, type of carrier and number of files on that carrier, and appraisal criteria. By adding technical and descriptive metadata, I am able to add a layer of contextual information that provides archivists and researchers important information to help understand the materials (descriptive metadata), as well as document technical specifications like file format, size, date of creation, etc…(technical metadata).

Additionally, Bagger creates checksums to make sure that the bag’s contents (i.e., files) are not changed or the data is not corrupted. Using Bagger allows us to group the files together and put them in a preservation format where they can wait to be ingested into AXAEM.

Archiving digital content provides unique challenges to archivists. Digital files are susceptible to issues like bit rot, which occurs when bits flip or become unreadable. Bits are the basic units of any digital memory and expressed as either a 1 or a 0. To make sure digital content is authentic and unchanged, archivists use checksums. Checksums, also referred to as hash values or a hash, are an alphanumeric number generated by both Bagger and AXAEM. These values are generated when the bags are created and are then checked or validated at a later date to make sure that the files remain authentic. The original checksum should match when new ones are generated during the validation process. If a bit has flipped or the files have been corrupted somehow, then the checksums will no longer match. If this happens, the files will likely no longer open. bit rot

Processing digital files is still a relatively new endeavor for the archival field and institutions are working out the best practices and workflows for their organizations. Creating a workflow for ingesting digital files has taught me about the fragility of these records and the need for archives to create flexible digital files processing workflows. Processing records from the backlog has also uncovered materials and stories that the DUMCA did not know existed within our collections. See Part 3 of this series to read about how I uncovered photographs and interviewed a Duke Alumni about his experience during the Vietnam War.

* The term digital files and electronic records are often used interchangeably. The Duke University Medical Center Archives uses the term digital files.

This blog was contributed by Archives Intern Kahlee Leingang