1SUBSETS mode
2
3Subsets mode was inspired by the external mode "subsets" and it also renders
4the external mode "repeats" obsolete. Actually it also replaces the dumb16,
5dumb32, repeats16 and repeats32 external modes (except for the fact they can
6easily be modified to use a smaller subset of the Unicode space). Compared to
7the external variant, it's much faster, picks candidate order differently and
8never ever produces a duplicate. It also scales very well with node/fork/MPI.
9Furthermore, it does support full Unicode without resorting to a legacy code
10page. Obviously it does support session resume (the external variant doesn't).
11
12Subsets is a quick brute-force variant that tries to produce candidates in
13order of complexity, that is, it will try a long poor password such as
14"gggggggggggggggggg#" earlier than a short one with unique characters like
15"yok3" and in between them it will probably try "bubba".
16
17With no further options, "--subsets" will start at length 1 and a subset
18size of 1 (mimicing the "repeats" external mode) and then increment both of
19them in subset keysize order, ending at the format's max. length. It will
20use the full charset of printable ASCII (95 characters) but start with
21tiny subsets and increase from there.
22
23Obviously normal options like --min-len, --max-len and --target-encoding
24also applies.
25
26
27CHARSET
28
29You can specify your own charset, eg. --subsets=STRING where STRING is any
30charset you want. You can also use pre-defined charsets 0..9 in john.conf
31using --subsets=N - see the [subsets] section of john.conf. There is also
32a conf setting "DefaultCharset = N" (setting default to one of the presets)
33or even "DefaultCharset = STRING" for some other default. Finally there's
34an eastern egg in "--subset=full-unicode". That is a truly huge charset,
35do not count on it getting very far in subset and output lengths. Unless
36you let it run for ages, it will produce long candidates with very short
37subsets or vice versa.
38
39The only "magic" allowed in a charset (regardless of where it's defined)
40is you can use \U+HHHH or \U+HHHHH notation for any Unicode character
41except the very highest private area that has six hex digits. For example,
42to include the "Grinning Face" smiley, you'd use \U+1F600. Take care not
43to use a legacy target codepage that can't hold the characters you define,
44there will be no warnings. Using UTF-8, anything is obviously allowed.
45
46PROGRESS
47
48The progress/ETA counting is peculiar, in the same way as when mask mode
49iterates over lengths: A figure (n) will be shown, indicating the smallest
50length not yet exhausted. Example:
51
52 0:00:00:52 10.14% (5) (ETA: 16:25:30) 41161Kp/s _0v_0X..227//t
53
54This means we have exhausted length 4 and 10.14% of length 5. The estimated
55time when length 5 will be exhausted is 16:25:30, best case. Sometimes you
56will see no progress in those figures (the ETA will be pushed forward) -
57that is normal and just means we're currently producing candidates of bigger
58candidate lengths (but smaller subset sizes). If you look carefully you
59will realize this is exactly what was going on at the time it was printed:
60The candidates shown in the end is length 6 with a subset size of 4.
61
62
63REQUIRED PART OF CHARSET
64
65For advanced usage, there's another option "--subsets-required=N" where N
66is the number of characters in the charset (counting from left) that are
67required in every candidate. For example:
68
69--subsets=0123456789abcdef --min-len=4 --max-len=4
70
71This will produce all 65536 candidates possible at that length using hex
72digits. Now let's say you exhausted that one and want to try uppercase hex
73as well. Here's the clever way:
74
75--subsets=ABCDEF0123456789 --subsets-required=6 --min-len=4 --max-len=4
76
77So this means that the full charset is uppercase hex, but at least one of
78the first 6 of them (ie. one of ABCDEF) is required in every candidate. This
79means we will not produce a single dupe of the ones produced in the lower-
80case step, namely the 10,000 ones that only had decimal digits. So the total
81number output this time is only 55536. After these two sessions you have
82exhausted all lower OR upper case hexadecimal keyspace of length 4. A naive
83way of doing this could be a single session using:
84
85--subsets=0123456789abcdefABCDEF -min-len=4 -max-len=4
86
87- or -
88
89--mask=[0123456789abcdefABCDEF] -min-len=4 -max-len=4
90
91Both of these however would produce many candidates with mixed case like
92"dA56" which was not what we wanted, given that it would produce 234,256
93candidates instead of just 121072.
94
95Note that the above was just an illustrating example use of this option. A
96more realistic use could be full alphanumeric charsets where at least one
97digit is required, eg:
98
99-subsets=0123456789abcdefghijklmnopqrstuvwxyz --subsets-required=10
100
101
102SUBSET SIZES
103
104The options --subsets-min-diff=N and --subsets-max-diff=N (or the similar
105variants in john.conf) let you put a limit on complexity. Normally it's not
106needed unless you want a session that actually runs to finish before you
107die of age. For the --subsets-max-diff=N option, a negative N is parsed
108as "max. length - N".
109