1SUBSETS mode 2 3Subsets mode was inspired by the external mode "subsets" and it also renders 4the external mode "repeats" obsolete. Actually it also replaces the dumb16, 5dumb32, repeats16 and repeats32 external modes (except for the fact they can 6easily be modified to use a smaller subset of the Unicode space). Compared to 7the external variant, it's much faster, picks candidate order differently and 8never ever produces a duplicate. It also scales very well with node/fork/MPI. 9Furthermore, it does support full Unicode without resorting to a legacy code 10page. Obviously it does support session resume (the external variant doesn't). 11 12Subsets is a quick brute-force variant that tries to produce candidates in 13order of complexity, that is, it will try a long poor password such as 14"gggggggggggggggggg#" earlier than a short one with unique characters like 15"yok3" and in between them it will probably try "bubba". 16 17With no further options, "--subsets" will start at length 1 and a subset 18size of 1 (mimicing the "repeats" external mode) and then increment both of 19them in subset keysize order, ending at the format's max. length. It will 20use the full charset of printable ASCII (95 characters) but start with 21tiny subsets and increase from there. 22 23Obviously normal options like --min-len, --max-len and --target-encoding 24also applies. 25 26 27CHARSET 28 29You can specify your own charset, eg. --subsets=STRING where STRING is any 30charset you want. You can also use pre-defined charsets 0..9 in john.conf 31using --subsets=N - see the [subsets] section of john.conf. There is also 32a conf setting "DefaultCharset = N" (setting default to one of the presets) 33or even "DefaultCharset = STRING" for some other default. Finally there's 34an eastern egg in "--subset=full-unicode". That is a truly huge charset, 35do not count on it getting very far in subset and output lengths. Unless 36you let it run for ages, it will produce long candidates with very short 37subsets or vice versa. 38 39The only "magic" allowed in a charset (regardless of where it's defined) 40is you can use \U+HHHH or \U+HHHHH notation for any Unicode character 41except the very highest private area that has six hex digits. For example, 42to include the "Grinning Face" smiley, you'd use \U+1F600. Take care not 43to use a legacy target codepage that can't hold the characters you define, 44there will be no warnings. Using UTF-8, anything is obviously allowed. 45 46PROGRESS 47 48The progress/ETA counting is peculiar, in the same way as when mask mode 49iterates over lengths: A figure (n) will be shown, indicating the smallest 50length not yet exhausted. Example: 51 52 0:00:00:52 10.14% (5) (ETA: 16:25:30) 41161Kp/s _0v_0X..227//t 53 54This means we have exhausted length 4 and 10.14% of length 5. The estimated 55time when length 5 will be exhausted is 16:25:30, best case. Sometimes you 56will see no progress in those figures (the ETA will be pushed forward) - 57that is normal and just means we're currently producing candidates of bigger 58candidate lengths (but smaller subset sizes). If you look carefully you 59will realize this is exactly what was going on at the time it was printed: 60The candidates shown in the end is length 6 with a subset size of 4. 61 62 63REQUIRED PART OF CHARSET 64 65For advanced usage, there's another option "--subsets-required=N" where N 66is the number of characters in the charset (counting from left) that are 67required in every candidate. For example: 68 69--subsets=0123456789abcdef --min-len=4 --max-len=4 70 71This will produce all 65536 candidates possible at that length using hex 72digits. Now let's say you exhausted that one and want to try uppercase hex 73as well. Here's the clever way: 74 75--subsets=ABCDEF0123456789 --subsets-required=6 --min-len=4 --max-len=4 76 77So this means that the full charset is uppercase hex, but at least one of 78the first 6 of them (ie. one of ABCDEF) is required in every candidate. This 79means we will not produce a single dupe of the ones produced in the lower- 80case step, namely the 10,000 ones that only had decimal digits. So the total 81number output this time is only 55536. After these two sessions you have 82exhausted all lower OR upper case hexadecimal keyspace of length 4. A naive 83way of doing this could be a single session using: 84 85--subsets=0123456789abcdefABCDEF -min-len=4 -max-len=4 86 87- or - 88 89--mask=[0123456789abcdefABCDEF] -min-len=4 -max-len=4 90 91Both of these however would produce many candidates with mixed case like 92"dA56" which was not what we wanted, given that it would produce 234,256 93candidates instead of just 121072. 94 95Note that the above was just an illustrating example use of this option. A 96more realistic use could be full alphanumeric charsets where at least one 97digit is required, eg: 98 99-subsets=0123456789abcdefghijklmnopqrstuvwxyz --subsets-required=10 100 101 102SUBSET SIZES 103 104The options --subsets-min-diff=N and --subsets-max-diff=N (or the similar 105variants in john.conf) let you put a limit on complexity. Normally it's not 106needed unless you want a session that actually runs to finish before you 107die of age. For the --subsets-max-diff=N option, a negative N is parsed 108as "max. length - N". 109