Page Contents 1 minute read.
Resources for web archiving.
Web Archiving Organizations and Institutions
- International Internet Preservation Consortium (IIPC)
group of libraries, archives, museums, and cultural heritage institutions working together to enable the collection and preservation of Internet content and to foster the development and use of common tools, techniques, and standards for the creation of international archives - Internet Archive
digital library of Internet sites and other cultural artifacts in digital form - National Digital Information Infrastructure & Preservation Program
(NDIIPP)
Library of Congress program to develop a national strategy to collect, preserve, and make available significant digital content, especially information that is created in digital form only
Web Archiving Software, Tools, and File Formats
- CDX File Format
specifications for the compound index (CDX) file format - Heritrix
the Internet Archive’s open-source, extensible, web-scale, archival-quality web crawler
Heritrix Scope
Heritrix Settings - SURT
definition and examples for the Sort-friendly URI Reordering Transform (SURT) used in web crawling applications - WARC File Format Information and Documentation, ISO 28500:2009
specifications for the Web ARChive (WARC) file format - Awesome List of web archiving resources (IIPC-maintained)
- Browsertrix Cloud
- Browsertrix Crawler
- pywb