1# SHOW FORMATS OPTION
2
3This file tries to document `--show=formats` option (and a bit of
4deprecated `--show=types`).
5
6john loads hashes of one format from given hash files. john gives
7hints about some other formats seen in the file. `--show=formats` will
8try every line against every format (with a few exceptions) and show
9formats that can load line independently for each line. It does not
10mean that john can load all lines at the same time (in one session).
11
12
13The exceptions:
14
15- some dynamic formats are disabled in configuration file, they may be
16  enabled temporarily specifying the format with `--format=` option
17  with exact name,
18
19- ad-hoc dynamic formats (except `dynamic=md5($p)`) are not checked
20  even when hashes have respective format tags, ad-hoc format may be
21  forced specifying the format with `--format=` option (e.g.
22  `--format='dynamic=sha1(sha1($p))'`).
23
24
25Basic uses include:
26
27- manual and automatic investigation of a file with hashes, hash type
28  identification,
29
30- easier understanding of cases when john loads only a part of file,
31  - but `--show=invalid` may be more convenient to use,
32
33- help in automated matching of canonical hashes in .pot files with
34  hashes in original form.
35
36
37## Output and manual investigation
38
39john with `--show=formats` will parse specified hash file and print
40information about each line in JSON format. JSON format is simple and
41almost readable.
42
43Example: 2 descrypt hashes:
44```
45$ cat ab.pw
46AAa6CzJlsalyo
47BBODHXVAdtcmc
48```
49
50(`--format=descrypt` is specified to reduce output.)
51```
52$ ./JohnTheRipper/run/john --show=formats ab.pw --format=descrypt
53[{"lineNo":1,"ciphertext":"AAa6CzJlsalyo","rowFormats":[{"label":"descrypt","prepareEqCiphertext":true,"canonHash":["AAa6CzJlsalyo"]}]},
54{"lineNo":2,"ciphertext":"BBODHXVAdtcmc","rowFormats":[{"label":"descrypt","prepareEqCiphertext":true,"canonHash":["BBODHXVAdtcmc"]}]}]
55```
56
57Example: 1 descrypt hash and pretty printing:
58```
59$ cat a.pw
60AAa6CzJlsalyo
61```
62
63With pretty printing by json_pp (it is a part of perl package (on Debian)):
64```
65$ john --show=formats a.pw | json_pp
66[
67   {
68      "ciphertext" : "AAa6CzJlsalyo",
69      "lineNo" : 1,
70      "rowFormats" : [
71         {
72            "canonHash" : [
73               "AAa6CzJlsalyo"
74            ],
75            "label" : "descrypt",
76            "prepareEqCiphertext" : true
77         },
78         {
79            "label" : "crypt",
80            "prepareEqCiphertext" : true,
81            "canonHash" : [
82               "AAa6CzJlsalyo"
83            ]
84         }
85      ]
86   }
87]
88```
89
90Or with one-liner in python:
91```
92$ john --show=formats a.pw | python -c 'import json, sys, pprint; pprint.pprint(json.load(sys.stdin))'
93[{u'ciphertext': u'AAa6CzJlsalyo',
94  u'lineNo': 1,
95  u'rowFormats': [{u'canonHash': [u'AAa6CzJlsalyo'],
96                   u'label': u'descrypt',
97                   u'prepareEqCiphertext': True},
98                  {u'canonHash': [u'AAa6CzJlsalyo'],
99                   u'label': u'crypt',
100                   u'prepareEqCiphertext': True}]}]
101```
102
103There is a list of dictionaries with information for each line.
104
105A dictionary for line may contain such keys/fields:
106
107- `lineNo` is the number of line in file starting from 1,
108  - numbering is continuous among multiple files with hashes (it may
109    be change in future versions),
110
111- `login` is for login,
112  - it may be absent if login is empty,
113  - it may be absent if login is not specified and line is skipped,
114  - it may contain dummy value `?` that john uses when login is not specified,
115
116- `ciphertext` is for ciphertext as it is extracted from hash file,
117  - it may be absent if ciphertext is empty (or was cut by john to be empty),
118
119- `rowFormats` is a list for descriptions of john's formats that can
120  load the line for cracking (see below),
121  - it may be empty list if line was skipped or none of formats can parse it,
122
123- `skipped` is to show that line is too short to be loaded by any
124  format, so it is not passed to formats' checks at all,
125  - the value does not represent reason, the reason is always the
126    same: the hash is too short and none of formats can load it,
127  - the value of this field is the origin of decision to skip, that's
128    a label of branch in code that skipped the line (so you may check
129    code in `loader.c`),
130
131- `uid`, `gid`, `gecos`, `home`, `shell` are for additional
132  information about user (provided in some formats of hash files),
133  - they may be absent if they are empty,
134  - some fields may be used for different purposes in some formats of
135    hash files, john should handle it well (i.e. `uid` contains LM
136    ciphertext in PWDUMP files),
137  - `gecos`, `home`, `shell` may be absent also if they have dummy value `/`.
138
139
140`rowFormats` field contains a list of dictionaries with results of
141successful parsing of line by formats.
142
143Each dictionary in `rowFormats` list may have the following keys/fields:
144
145- `label` is the name of format that may be used for `--format=` option,
146
147- `dynamic` is boolean value,
148  - it is true if format uses engine for dynamic formats,
149  - it is absent if it is false,
150
151- `prepareEqCiphertext` is boolean value,
152  - it is true if `prepare()` method of formats returned same
153    ciphertext after processing (it may be interesting to developers
154    of formats),
155  - it is absent if it is false,
156
157- `canonHash` is a list of strings containing ciphertext in canonical form,
158  - cracked hashes are saved to .pot file in canonical form unless it
159    is too long (see `truncated` field),
160  - it may contain multiple values for some formats (e.g. full LM
161    gives two independent halves),
162  - canonical hash is almost unambiguous form of hash that allows john
163    to load this hash with respective format without `--format=`
164    option,
165    - there may be a few exceptional formats that have canonical form
166      that cannot be distinguished from other formats,
167    - formats that are different implementations of same hash type
168      have same canonical form usually (e.g. raw-md5 and
169      raw-md5-opencl formats),
170
171- `truncHash` is a list like `canonHash` but contains shorter hash
172  that would be used instead of canonical hash in .pot file (see
173  `truncated` field),
174
175- `truncated` is boolean field that shows whether `canonHash` or
176  `truncHash` is used for .pot file,
177  - it is absent when `canonHash` would be used in .pot file,
178  - it is true when `truncHash` would be used in .pot file,
179  - it may be true, while certain hash is short enough to be saved in
180    canonical form, there is no `truncHash` field in this case.
181
182
183Example: a hash is transformed into canonical form and saved to .pot file.
184```
185$ cat 123456.pw
186e10adc3949ba59abbe56e057f20f883e
187```
188
189```
190$ john --format=raw-md5 123456.pw --show=formats
191[{..."rowFormats":[{"label":"Raw-MD5",...,"canonHash":["$dynamic_0$e10adc3949ba59abbe56e057f20f883e"]}]}]
192```
193
194```
195$ john --format=raw-md5 123456.pw --pot=123456.pot
196[...]
197123456           (?)
198[...]
199$ cat 123456.pot
200$dynamic_0$e10adc3949ba59abbe56e057f20f883e:123456
201```
202
203
204Example: PWDUMP format and LM halves.
205```
206$ cat pwdump.pw
207alogin:aaaaaaaaaaaaaaaabbbbbbbbbbbbbbbb:ccccccccccccccccdddddddddddddddd
208```
209
210(Output is reformatted and edited.)
211```
212$ john --show=formats pwdump.pw
213[{"lineNo":1,"login":"alogin",
214"ciphertext":"aaaaaaaaaaaaaaaabbbbbbbbbbbbbbbb",
215"uid":"ccccccccccccccccdddddddddddddddd",
216"rowFormats":[
217  {"label":"LM","canonHash":["$LM$cccccccccccccccc","$LM$dddddddddddddddd"]},
218  ...
219  {"label":"NT","prepareEqCiphertext":true,"canonHash":["$NT$aaaaaaaaaaaaaaaabbbbbbbbbbbbbbbb"]},
220  ...
221  {"label":"Snefru-128","prepareEqCiphertext":true,"canonHash":["$snefru$aaaaaaaaaaaaaaaabbbbbbbbbbbbbbbb"]},
222...]}]
223```
224
225When PWDUMP format of file is identified, the third field (aka `uid`)
226is used for full LM hash. With such line, it is not possible to load
227`aaaaaaaaaaaaaaaabbbbbbbbbbbbbbbb` as LM, the hash should be extracted
228onto a separate line manually.
229
230Despite detection of PWDUMP format of file, `ciphertext` field may be
231loaded by other with `--format=` option (except LM). 32 hex may be
232loaded by a lot of formats (e.g. `Snefru-128` in the example).
233
234
235## Interaction with other options
236
237### `--users=`, `--groups=`, `--shells=`
238
239`--users=`, `--groups=`, `--shells=` options affect work of
240`--show=formats` but skipped lines will be reported anyway.
241
242### `--format=`
243
244Formats to be tried may be limited with `--format=` option.
245
246- 1 exact format may be specified (e.g. `--format=raw-md5`),
247
248  - a disabled dynamic format may be enabled this way for temporary
249    use,
250
251  - ad-hoc dynamic formats may be specified this way (e.g.
252    `--format='sha512($s.sha512($p.$s).$p)'`, see doc/DYNAMIC ),
253
254- multiple formats may be specified by mask in `--format=` option
255  (e.g. `--format=*crypt`, `--format=mssql*`),
256
257  - set of formats in john may differ between builds, so
258    `--list=formats` with `--format=` may be used to check that
259    formats are available and the problem is not specific to
260    `--show=formats` (e.g. `--format=*-opencl` would fail when john is
261    built without OpenCL support),
262
263- other formats will not be checked and reported, it may be useful
264  because `--show=formats` may be slow or produce too much output.
265
266
267Example: choose a subset of formats (john is built without OpenCL).
268```
269$ john --format=mssql* --list=formats
270mssql, mssql05, mssql12
271$ john --format=*crypt --list=formats
272descrypt, bsdicrypt, md5crypt, bcrypt, scrypt, adxcrypt, AxCrypt, BestCrypt,
273sha1crypt, sha256crypt, sha512crypt, django-scrypt, Raw-SHA1-AxCrypt, crypt
274```
275
276
277Example: `--show=formats` fails due to lack of OpenCL formats, so we
278check `--list=formats`.
279```
280$ john --format=*-opencl --list=formats
281Unknown ciphertext format name requested
282$ john --format=*-opencl --show=formats t.pw
283Unknown ciphertext format name requested
284```
285
286
287## Automatic parsing of output
288
289The whole output may be read as JSON easily (see example above with
290a.pw and python).
291
292It is possible to avoid reading of full output into memory for
293sequential processing, because output is guaranteed to have one
294dictionary for one input line on a single separate output line.
295
296`rowFormats` field's value is a list always. Empty list means that
297line cannot be loaded by any format.
298
299Example: print `ciphertext` field and list of format name that can
300load it, processing JSON line by line with python.
301```
302$ cat ab.pw
303AAa6CzJlsalyo
304BBODHXVAdtcmc
305```
306
307```
308$ john --show=formats ab.pw | python -c '
309> import json, sys
310> for l in sys.stdin:
311>     l = l.strip("[],\r\n")
312>     d = json.loads(l)
313>     fs = [ f["label"] for f in d["rowFormats"] ]
314>     print(d["ciphertext"], fs)
315> '
316(u'AAa6CzJlsalyo', [u'descrypt', u'crypt'])
317(u'BBODHXVAdtcmc', [u'descrypt', u'crypt'])
318```
319