About Hashcat mask processing
The task #
Crack this SHA1 hash: 4e174bbc3e0a536aa8899d1f459318f797dc325a
We have a machine with two NVIDIA GeForce RTX 4090 cards, so hash performance is GREAT!
$ hashcat -m 100 -b
Speed.#1.........: 47426.7 MH/s (44.94ms) @ Accel:32 Loops:1024 Thr:512 Vec:1
Speed.#2.........: 49805.4 MH/s (42.75ms) @ Accel:32 Loops:1024 Thr:512 Vec:1
Speed.#*.........: 97232.1 MH/s
So the second card has ~50000 MH/s speed. Let's work with that.
If we know the length of the input string and the character classes, we can set a mask:
# -m 100 SHA1
# -a 3 brute-force
# ?a?a?a?a?a?a?a?a?a?a?a the pattern (all ascii)
# -w 3 workload profile
$ hashcat -m 100 hash.txt -a 3 ?a?a?a?a?a?a?a?a?a?a?a --optimized-kernel-enable -w 3
Speed.#2.........: 48521.0 MH/s (21.73ms) @ Accel:512 Loops:512 Thr:32 Vec:1
It's cracking with full speed (50000 MH/s). What if we knew the first 4 characters?
$ hashcat -m 100 hash.txt -a 3 ABCD?a?a?a?a?a?a?a --optimized-kernel-enable -w 3
Speed.#2.........: 509.4 MH/s (0.22ms) @ Accel:512 Loops:1 Thr:32 Vec:1
Oh hell no! Why is it just 500 MH/s, 100x slower? All we did was help it! As it turns out from the docs:
In Hashcat, we accomplish this by splitting attacks up into two loops: a “base loop”, and a “mod(ifier) loop.” The base loop is executed on the host computer and contains the initial password candidates (the “base words.”) The mod loop is executed on the GPU, and generates the final password candidates from the base words on the GPU directly. The mod loop is our amplifier – this is the source of our GPU acceleration.
What happens in the mod loop depends on the attack mode. For brute force, a portion of the mask is calculated in the base loop, while the remaining portion of the mask is calculated in the mod loop.
Ok, so our loops are not generating enaugh input for the GPU to process. But why would setting characters slow
generation down, shouldn't it just generate more candidates based on the other ?a
s?
What if we set only 2 characters?
$ hashcat -m 100 hash.txt -a 3 AB?a?a?a?a?a?a?a?a --optimized-kernel-enable -w 3
Speed.#2.........: 49389.4 MH/s (41.93ms) @ Accel:32 Loops:1024 Thr:512 Vec:1
Ok, it's back to full speed. What about 3 characters?
$ hashcat -m 100 hash.txt -a 3 ABC?a?a?a?a?a?a?a?a --optimized-kernel-enable -w 3
Speed.#2.........: 25247.6 MH/s (3.98ms) @ Accel:64 Loops:95 Thr:256 Vec:1
Half speed (25000 MH/s). Hmmm, ok, what if we set even more characters, for example 6? And a few more ?a
s in order
not to finish in 1s.
$ hashcat -m 100 hash.txt -a 3 ABCDEF?a?a?a?a?a?a?a --optimized-kernel-enable -w 3
Speed.#2.........: 508.5 MH/s (0.22ms) @ Accel:512 Loops:1 Thr:32 Vec:1
Ok, so it doesn't go lower than that.
Now what if we set 4 characters, but not the first 4?
$ hashcat -m 100 hash.txt -a 3 ?a?a?a?a?a?a?aHIJK --optimized-kernel-enable -w 3
Speed.#2.........: 49815.8 MH/s (42.71ms) @ Accel:32 Loops:1024 Thr:512 Vec:1
What on Earth?! It's back to full speed. So if we set the first 4 characters, it's 100x slower than if we set the last 4 characters? Why does it matter which characters we set?
This could only mean one thing. The parallelization of the input generation from the mask is solely split by the first 2-3 characters. So by setting those, we eliminate ~99% of the parallel generators, so only 1% generates the inputs which is not enaugh for the GPU, because it could compute SHA1 faster.
We can verify this by setting charactes 2-5 and letting 1. be only digits (10 candidates). This should allow 10x more
parallel generators than in the slowest case, so speed should be ~500 MH/s * 10 = 5000 MH/s
.
$ hashcat -m 100 hash.txt -a 3 ABCDE?a?a?a?a?a?a --optimized-kernel-enable -w 3
Speed.#2.........: 526.4 MH/s (0.22ms) @ Accel:512 Loops:1 Thr:32 Vec:1
$ hashcat -m 100 hash.txt -a 3 ?dBCDE?a?a?a?a?a?a --optimized-kernel-enable -w 3
Speed.#2.........: 4802.7 MH/s (0.45ms) @ Accel:256 Loops:10 Thr:64 Vec:1
$ hashcat -m 100 hash.txt -a 3 ?d?dCDE?a?a?a?a?a?a --optimized-kernel-enable -w 3
Speed.#2.........: 25782.5 MH/s (4.19ms) @ Accel:64 Loops:100 Thr:256 Vec:1
$ hashcat -m 100 hash.txt -a 3 ABCD?d?a?a?a?a?a?a --optimized-kernel-enable -w 3
Speed.#2.........: 509.8 MH/s (0.22ms) @ Accel:128 Loops:1 Thr:128 Vec:1
Whoa, it worked! it's actually 10x more! And the other variants prove that although the password candidate space is the same, it can only generate them paralelly if the first 2-3 characters have multiple candidates.
So it's better to know what a password ends with, then what it starts with. Remember that, until a better implementation comes along.
- Previous post: From disk image to offline windows AD account login
- Next post: Pwning a nuclear-grade entrance control system easily