1# SHOW FORMATS OPTION 2 3This file tries to document `--show=formats` option (and a bit of 4deprecated `--show=types`). 5 6john loads hashes of one format from given hash files. john gives 7hints about some other formats seen in the file. `--show=formats` will 8try every line against every format (with a few exceptions) and show 9formats that can load line independently for each line. It does not 10mean that john can load all lines at the same time (in one session). 11 12 13The exceptions: 14 15- some dynamic formats are disabled in configuration file, they may be 16 enabled temporarily specifying the format with `--format=` option 17 with exact name, 18 19- ad-hoc dynamic formats (except `dynamic=md5($p)`) are not checked 20 even when hashes have respective format tags, ad-hoc format may be 21 forced specifying the format with `--format=` option (e.g. 22 `--format='dynamic=sha1(sha1($p))'`). 23 24 25Basic uses include: 26 27- manual and automatic investigation of a file with hashes, hash type 28 identification, 29 30- easier understanding of cases when john loads only a part of file, 31 - but `--show=invalid` may be more convenient to use, 32 33- help in automated matching of canonical hashes in .pot files with 34 hashes in original form. 35 36 37## Output and manual investigation 38 39john with `--show=formats` will parse specified hash file and print 40information about each line in JSON format. JSON format is simple and 41almost readable. 42 43Example: 2 descrypt hashes: 44``` 45$ cat ab.pw 46AAa6CzJlsalyo 47BBODHXVAdtcmc 48``` 49 50(`--format=descrypt` is specified to reduce output.) 51``` 52$ ./JohnTheRipper/run/john --show=formats ab.pw --format=descrypt 53[{"lineNo":1,"ciphertext":"AAa6CzJlsalyo","rowFormats":[{"label":"descrypt","prepareEqCiphertext":true,"canonHash":["AAa6CzJlsalyo"]}]}, 54{"lineNo":2,"ciphertext":"BBODHXVAdtcmc","rowFormats":[{"label":"descrypt","prepareEqCiphertext":true,"canonHash":["BBODHXVAdtcmc"]}]}] 55``` 56 57Example: 1 descrypt hash and pretty printing: 58``` 59$ cat a.pw 60AAa6CzJlsalyo 61``` 62 63With pretty printing by json_pp (it is a part of perl package (on Debian)): 64``` 65$ john --show=formats a.pw | json_pp 66[ 67 { 68 "ciphertext" : "AAa6CzJlsalyo", 69 "lineNo" : 1, 70 "rowFormats" : [ 71 { 72 "canonHash" : [ 73 "AAa6CzJlsalyo" 74 ], 75 "label" : "descrypt", 76 "prepareEqCiphertext" : true 77 }, 78 { 79 "label" : "crypt", 80 "prepareEqCiphertext" : true, 81 "canonHash" : [ 82 "AAa6CzJlsalyo" 83 ] 84 } 85 ] 86 } 87] 88``` 89 90Or with one-liner in python: 91``` 92$ john --show=formats a.pw | python -c 'import json, sys, pprint; pprint.pprint(json.load(sys.stdin))' 93[{u'ciphertext': u'AAa6CzJlsalyo', 94 u'lineNo': 1, 95 u'rowFormats': [{u'canonHash': [u'AAa6CzJlsalyo'], 96 u'label': u'descrypt', 97 u'prepareEqCiphertext': True}, 98 {u'canonHash': [u'AAa6CzJlsalyo'], 99 u'label': u'crypt', 100 u'prepareEqCiphertext': True}]}] 101``` 102 103There is a list of dictionaries with information for each line. 104 105A dictionary for line may contain such keys/fields: 106 107- `lineNo` is the number of line in file starting from 1, 108 - numbering is continuous among multiple files with hashes (it may 109 be change in future versions), 110 111- `login` is for login, 112 - it may be absent if login is empty, 113 - it may be absent if login is not specified and line is skipped, 114 - it may contain dummy value `?` that john uses when login is not specified, 115 116- `ciphertext` is for ciphertext as it is extracted from hash file, 117 - it may be absent if ciphertext is empty (or was cut by john to be empty), 118 119- `rowFormats` is a list for descriptions of john's formats that can 120 load the line for cracking (see below), 121 - it may be empty list if line was skipped or none of formats can parse it, 122 123- `skipped` is to show that line is too short to be loaded by any 124 format, so it is not passed to formats' checks at all, 125 - the value does not represent reason, the reason is always the 126 same: the hash is too short and none of formats can load it, 127 - the value of this field is the origin of decision to skip, that's 128 a label of branch in code that skipped the line (so you may check 129 code in `loader.c`), 130 131- `uid`, `gid`, `gecos`, `home`, `shell` are for additional 132 information about user (provided in some formats of hash files), 133 - they may be absent if they are empty, 134 - some fields may be used for different purposes in some formats of 135 hash files, john should handle it well (i.e. `uid` contains LM 136 ciphertext in PWDUMP files), 137 - `gecos`, `home`, `shell` may be absent also if they have dummy value `/`. 138 139 140`rowFormats` field contains a list of dictionaries with results of 141successful parsing of line by formats. 142 143Each dictionary in `rowFormats` list may have the following keys/fields: 144 145- `label` is the name of format that may be used for `--format=` option, 146 147- `dynamic` is boolean value, 148 - it is true if format uses engine for dynamic formats, 149 - it is absent if it is false, 150 151- `prepareEqCiphertext` is boolean value, 152 - it is true if `prepare()` method of formats returned same 153 ciphertext after processing (it may be interesting to developers 154 of formats), 155 - it is absent if it is false, 156 157- `canonHash` is a list of strings containing ciphertext in canonical form, 158 - cracked hashes are saved to .pot file in canonical form unless it 159 is too long (see `truncated` field), 160 - it may contain multiple values for some formats (e.g. full LM 161 gives two independent halves), 162 - canonical hash is almost unambiguous form of hash that allows john 163 to load this hash with respective format without `--format=` 164 option, 165 - there may be a few exceptional formats that have canonical form 166 that cannot be distinguished from other formats, 167 - formats that are different implementations of same hash type 168 have same canonical form usually (e.g. raw-md5 and 169 raw-md5-opencl formats), 170 171- `truncHash` is a list like `canonHash` but contains shorter hash 172 that would be used instead of canonical hash in .pot file (see 173 `truncated` field), 174 175- `truncated` is boolean field that shows whether `canonHash` or 176 `truncHash` is used for .pot file, 177 - it is absent when `canonHash` would be used in .pot file, 178 - it is true when `truncHash` would be used in .pot file, 179 - it may be true, while certain hash is short enough to be saved in 180 canonical form, there is no `truncHash` field in this case. 181 182 183Example: a hash is transformed into canonical form and saved to .pot file. 184``` 185$ cat 123456.pw 186e10adc3949ba59abbe56e057f20f883e 187``` 188 189``` 190$ john --format=raw-md5 123456.pw --show=formats 191[{..."rowFormats":[{"label":"Raw-MD5",...,"canonHash":["$dynamic_0$e10adc3949ba59abbe56e057f20f883e"]}]}] 192``` 193 194``` 195$ john --format=raw-md5 123456.pw --pot=123456.pot 196[...] 197123456 (?) 198[...] 199$ cat 123456.pot 200$dynamic_0$e10adc3949ba59abbe56e057f20f883e:123456 201``` 202 203 204Example: PWDUMP format and LM halves. 205``` 206$ cat pwdump.pw 207alogin:aaaaaaaaaaaaaaaabbbbbbbbbbbbbbbb:ccccccccccccccccdddddddddddddddd 208``` 209 210(Output is reformatted and edited.) 211``` 212$ john --show=formats pwdump.pw 213[{"lineNo":1,"login":"alogin", 214"ciphertext":"aaaaaaaaaaaaaaaabbbbbbbbbbbbbbbb", 215"uid":"ccccccccccccccccdddddddddddddddd", 216"rowFormats":[ 217 {"label":"LM","canonHash":["$LM$cccccccccccccccc","$LM$dddddddddddddddd"]}, 218 ... 219 {"label":"NT","prepareEqCiphertext":true,"canonHash":["$NT$aaaaaaaaaaaaaaaabbbbbbbbbbbbbbbb"]}, 220 ... 221 {"label":"Snefru-128","prepareEqCiphertext":true,"canonHash":["$snefru$aaaaaaaaaaaaaaaabbbbbbbbbbbbbbbb"]}, 222...]}] 223``` 224 225When PWDUMP format of file is identified, the third field (aka `uid`) 226is used for full LM hash. With such line, it is not possible to load 227`aaaaaaaaaaaaaaaabbbbbbbbbbbbbbbb` as LM, the hash should be extracted 228onto a separate line manually. 229 230Despite detection of PWDUMP format of file, `ciphertext` field may be 231loaded by other with `--format=` option (except LM). 32 hex may be 232loaded by a lot of formats (e.g. `Snefru-128` in the example). 233 234 235## Interaction with other options 236 237### `--users=`, `--groups=`, `--shells=` 238 239`--users=`, `--groups=`, `--shells=` options affect work of 240`--show=formats` but skipped lines will be reported anyway. 241 242### `--format=` 243 244Formats to be tried may be limited with `--format=` option. 245 246- 1 exact format may be specified (e.g. `--format=raw-md5`), 247 248 - a disabled dynamic format may be enabled this way for temporary 249 use, 250 251 - ad-hoc dynamic formats may be specified this way (e.g. 252 `--format='sha512($s.sha512($p.$s).$p)'`, see doc/DYNAMIC ), 253 254- multiple formats may be specified by mask in `--format=` option 255 (e.g. `--format=*crypt`, `--format=mssql*`), 256 257 - set of formats in john may differ between builds, so 258 `--list=formats` with `--format=` may be used to check that 259 formats are available and the problem is not specific to 260 `--show=formats` (e.g. `--format=*-opencl` would fail when john is 261 built without OpenCL support), 262 263- other formats will not be checked and reported, it may be useful 264 because `--show=formats` may be slow or produce too much output. 265 266 267Example: choose a subset of formats (john is built without OpenCL). 268``` 269$ john --format=mssql* --list=formats 270mssql, mssql05, mssql12 271$ john --format=*crypt --list=formats 272descrypt, bsdicrypt, md5crypt, bcrypt, scrypt, adxcrypt, AxCrypt, BestCrypt, 273sha1crypt, sha256crypt, sha512crypt, django-scrypt, Raw-SHA1-AxCrypt, crypt 274``` 275 276 277Example: `--show=formats` fails due to lack of OpenCL formats, so we 278check `--list=formats`. 279``` 280$ john --format=*-opencl --list=formats 281Unknown ciphertext format name requested 282$ john --format=*-opencl --show=formats t.pw 283Unknown ciphertext format name requested 284``` 285 286 287## Automatic parsing of output 288 289The whole output may be read as JSON easily (see example above with 290a.pw and python). 291 292It is possible to avoid reading of full output into memory for 293sequential processing, because output is guaranteed to have one 294dictionary for one input line on a single separate output line. 295 296`rowFormats` field's value is a list always. Empty list means that 297line cannot be loaded by any format. 298 299Example: print `ciphertext` field and list of format name that can 300load it, processing JSON line by line with python. 301``` 302$ cat ab.pw 303AAa6CzJlsalyo 304BBODHXVAdtcmc 305``` 306 307``` 308$ john --show=formats ab.pw | python -c ' 309> import json, sys 310> for l in sys.stdin: 311> l = l.strip("[],\r\n") 312> d = json.loads(l) 313> fs = [ f["label"] for f in d["rowFormats"] ] 314> print(d["ciphertext"], fs) 315> ' 316(u'AAa6CzJlsalyo', [u'descrypt', u'crypt']) 317(u'BBODHXVAdtcmc', [u'descrypt', u'crypt']) 318``` 319