Research

The Digital Libraries’ recent research related to web archiving.

Leveraging Existing Bibliographic Metadata to Improve Automatic Document Identification in Web Archives, 2022Update URL to this section

The UNT Libraries in partnership with the University of Illinois Chicago’s Computer Science Department received IMLS support for an applied research grant with the long-term objective of improving access to digital resources housed in web archives. This applied research project will build on findings from a previously funded IMLS research grant (LG-71-17-0202-17) that was a first effort in training machine models to help identify high-value documents and publications within web archives. This project seeks to incorporate existing bibliographic metadata related to state government document collections to better train machine learning models and allow for a reduction in human effort, as the process is still time consuming and requires highly-trained content curators.

Programmatic Extraction of ‘Documents’ from Web Archives, 2017Update URL to this section

The UNT Libraries and the University of Illinois Chicago’s Department of Computer Science received IMLS support under the National Digital Platform category for a two year research project to evaluate the use of machine learning algorithms to successfully identify and extract publications contained in existing Web archives. Identifying these documents will empower libraries, archives, and museums to meet their curatorial missions.

Current Quality Assurance Practices in Web Archiving, 2014Update URL to this section

UNT’s team surveyed people and institutions involved in web archiving to understand the current climate and future needs for quality assurance. For more information, see our paper and presentation.

Classification of the End-of-Term Archive: Extending Collection Development to Web Archives (eotcd), 2010-2012Update URL to this section

In this project, funded by the Institute of Museum and Library Services (IMLS), UNT partnered with the Internet Archive to investigate innovative solutions allowing libraries to better characterize, identify, and select archived Web materials in accordance with their collection development policies. The project used the SuDocs system to classify the materials in the 2008–2009 End-of-Term (EOT) Archive, collected by UNT and its partners, which represents the entirety of the federal government’s public Web presence immediately before and after the 2009 change in presidential administrations. The project also identified metrics to translate measurable units for selected materials in Web archives to units more familiar to libraries and more recognizable by university administrators. For more information, see the eotcd final report and archived web site.

The Web-at-Risk, 2004-2007Update URL to this section

Funded by the National Digital Information and Infrastructure Preservation Program (NDIIPP) at the Library of Congress, this project was a collaborative effort of the California Digital Library, UNT, and New York University to develop tools to enable curators to build collections of web-published materials. The project produced the Web Archiving Service as well as significant research in needs assessment. Many of the reports produced by the project are available in the UNT Digital Library.

Chat with Us

Start a new chat in fullscreen. When chat is unavailable you can:

Call Us

Email Us

Messages to AskUs@unt.edu are answered within 48 hours. For subject specific questions, please contact a Subject Librarian.