Jay Taylor's notes

back to listing index

WARC - Just Solve the File Format Problem

[web search]
Original source (fileformats.archiveteam.org)
Tags: web-archiving archival fileformats.archiveteam.org
Clipped on: 2017-11-08


Jump to: navigation, search
File Format
Extension(s) .warc, .warc.gz
MIME Type(s) application/warc, application/warc-fields
PRONOM fmt/289

WARC is the successor to the ARC (Internet Archive) format. Standardized as ISO 28500:2009, Information and documentation -- WARC file format. Developed under the auspices of the International Internet Preservation Consortium. WARC was developed as an extension to ARC in part to provide better capabilities for managing Web archives for the long term, allowing for capture of more metadata about the circumstances of archiving.

WARC files are often compressed using gzip, resulting in a .warc.gz extension.

There is also a specification for a Web Archive Metadata File. Another metadata format used with WARC files is CDX.




Sample files


Other links and references

  • This page was last modified on 17 May 2016, at 13:42.
  • This page has been accessed 11,282 times.
  • Content is available under Creative Commons 0.
  • Creative Commons 0
  • Image (Asset 1/1) alt=