1# Siegfried 2 3[Siegfried](http://www.itforarchivists.com/siegfried) is a signature-based file format identification tool, implementing: 4 5 - the National Archives UK's [PRONOM](http://www.nationalarchives.gov.uk/pronom) file format signatures 6 - freedesktop.org's [MIME-info](https://freedesktop.org/wiki/Software/shared-mime-info/) file format signatures 7 - the Library of Congress's [FDD](http://www.digitalpreservation.gov/formats/fdd/descriptions.shtml) file format signatures (*beta*). 8 - Wikidata (*beta*). 9 10### Version 11 121.9.1 13 14[![Build Status](https://travis-ci.org/richardlehane/siegfried.png?branch=master)](https://travis-ci.org/richardlehane/siegfried) [![Build status](https://ci.appveyor.com/api/projects/status/1eqdmi2nvive0vgn?svg=true)](https://ci.appveyor.com/project/richardlehane/siegfried) [![GoDoc](https://godoc.org/github.com/richardlehane/siegfried?status.svg)](https://godoc.org/github.com/richardlehane/siegfried) [![Go Report Card](https://goreportcard.com/badge/github.com/richardlehane/siegfried)](https://goreportcard.com/report/github.com/richardlehane/siegfried) 15 16## Usage 17 18### Command line 19 20 sf file.ext 21 sf DIR 22 23#### Options 24 25 sf -csv file.ext | DIR // Output CSV rather than YAML 26 sf -json file.ext | DIR // Output JSON rather than YAML 27 sf -droid file.ext | DIR // Output DROID CSV rather than YAML 28 sf -nr DIR // Don't scan subdirectories 29 sf -z file.zip | DIR // Decompress and scan zip, tar, gzip, warc, arc 30 sf -zs gzip,tar file.tar.gz | DIR // Selectively decompress and scan 31 sf -hash md5 file.ext | DIR // Calculate md5, sha1, sha256, sha512, or crc hash 32 sf -sig custom.sig file.ext // Use a custom signature file 33 sf - // Scan stream piped to stdin 34 sf -name file.ext - // Provide filename when scanning stream 35 sf -f myfiles.txt // Scan list of files and directories 36 sf -v | -version // Display version information 37 sf -home c:\junk -sig custom.sig file.ext // Use a custom home directory 38 sf -serve hostname:port // Server mode 39 sf -throttle 10ms DIR // Pause for duration (e.g. 1s) between file scans 40 sf -multi 256 DIR // Scan multiple (e.g. 256) files in parallel 41 sf -log [comma-sep opts] file.ext | DIR // Log errors etc. to stderr (default) or stdout 42 sf -log e,w file.ext | DIR // Log errors and warnings to stderr 43 sf -log u,o file.ext | DIR // Log unknowns to stdout 44 sf -log d,s file.ext | DIR // Log debugging and slow messages to stderr 45 sf -log p,t DIR > results.yaml // Log progress and time while redirecting results 46 sf -log fmt/1,c DIR > results.yaml // Log instances of fmt/1 and chart results 47 sf -replay -log u -csv results.yaml // Replay results file, convert to csv, log unknowns 48 sf -setconf -multi 32 -hash sha1 // Save flag defaults in a config file 49 sf -setconf -serve :5138 -conf srv.conf // Save/load named config file with '-conf filename' 50 51#### Example 52 53[![asciicast](https://asciinema.org/a/ernm49loq5ofuj48ywlvg7xq6.png)](https://asciinema.org/a/ernm49loq5ofuj48ywlvg7xq6) 54 55#### Signature files 56 57By default, siegfried uses the latest PRONOM signatures without buffer limits (i.e. it may do full file scans). To use MIME-info or LOC signatures, or to add buffer limits or other customisations, use the [roy tool](https://github.com/richardlehane/siegfried/wiki/Building-a-signature-file-with-ROY) to build your own signature file. 58 59## Install 60### With go installed: 61 62 go get github.com/richardlehane/siegfried/cmd/sf 63 64 sf -update 65 66 67### Or, without go installed: 68#### Win: 69 70Download a pre-built binary from the [releases page](https://github.com/richardlehane/siegfried/releases). Unzip to a location in your system path. Then run: 71 72 sf -update 73 74#### Mac Homebrew (or [Linuxbrew](http://brew.sh/linuxbrew/)): 75 76 brew install mistydemeo/digipres/siegfried 77 78Or, for the most recent updates, you can install from this fork: 79 80 brew install richardlehane/digipres/siegfried 81 82#### Ubuntu/Debian (64 bit): 83 84 wget -qO - https://bintray.com/user/downloadSubjectPublicKey?username=bintray | sudo apt-key add - 85 echo "deb http://dl.bintray.com/siegfried/debian wheezy main" | sudo tee -a /etc/apt/sources.list 86 sudo apt-get update && sudo apt-get install siegfried 87 88#### FreeBSD: 89 90 pkg install siegfried 91 92#### Arch Linux: 93 94 git clone https://aur.archlinux.org/siegfried.git 95 cd siegfried 96 makepkg -si 97 98## Changes 99### v1.9.1 (2020-10-11) 100### Changed 101- update PRONOM to v97 102- zs flag now activates -z flag 103 104### Fixed 105- details text in PRONOM identifier 106- `roy` panic when building signatures with empty sequences. Reported by [Greg Lepore](https://github.com/richardlehane/siegfried/issues/149) 107 108### v1.9.0 (2020-09-22) 109### Added 110- a new Wikidata identifier, harvesting information from the Wikidata Query Service. Implemented by [Ross Spencer](https://github.com/richardlehane/siegfried/commit/dfb579b4ae46ae6daa814fc3fc74271d768f2f9c). 111- select which archive types (zip, tar, gzip, warc, or arc) are unpacked using the -zs flag (sf -zs tar,zip). Implemented by [Ross Spencer](https://github.com/richardlehane/siegfried/commit/88dd43b55e5f83304705f6bcd439d502ef08cd38). 112 113### Changed 114- update LOC signatures to 2020-09-21 115- update tika-mimetypes signatures to v1.24 116- update freedesktop.org signatures to v2.0 117 118### Fixed 119- incorrect basis for some signatures with multiple patterns. Reported and fixed by [Ross Spencer](https://github.com/richardlehane/siegfried/issues/142). 120 121### v1.8.0 (2020-01-22) 122### Added 123- utc flag returns file modified dates in UTC e.g. `sf -utc FILE | DIR`. Requested by [Dragan Espenschied](https://github.com/richardlehane/siegfried/issues/136) 124- new cost and repetition flags to control segmentation when building signatures 125 126### Changed 127- update PRONOM to v96 128- update LOC signatures to 2019-12-18 129- update tika-mimetypes signatures to v1.23 130- update freedesktop.org signatures to v1.15 131 132### Fixed 133- XML namespaces detected by prefix on root tag, as well as default namespace (for mime-info spec) 134- panic when scanning certain MS-CFB files. Reported separately by Mike Shallcross and Euan Cochrane 135- file with many FF xx sequences grinds to a halt. Reported by [Andy Foster](https://github.com/richardlehane/siegfried/issues/128) 136 137See the [CHANGELOG](CHANGELOG.md) for the full history. 138 139## Rights 140 141Copyright 2020 Richard Lehane, Ross Spencer 142 143Licensed under the [Apache License, Version 2.0](http://www.apache.org/licenses/LICENSE-2.0) 144 145## Announcements 146 147Join the [Google Group](https://groups.google.com/d/forum/sf-roy) for updates, signature releases, and help. 148 149## Contributing 150 151Like siegfried and want to get involved in its development? That'd be wonderful! There are some notes on the [wiki](https://github.com/richardlehane/siegfried/wiki) to get you started, and please get in touch. 152 153## Thanks 154 155Thanks TNA for http://www.nationalarchives.gov.uk/pronom/ and http://www.nationalarchives.gov.uk/information-management/projects-and-work/droid.htm 156 157Thanks Ross for https://github.com/exponential-decay/skeleton-test-suite-generator and http://exponentialdecay.co.uk/sd/index.htm, both are very handy! 158 159Thanks Misty for the brew and ubuntu packaging 160 161Thanks Steffen for the FreeBSD and Arch Linux packaging 162