OK, so I've started writing my own word-list merge and duplicate remover
already done 2 word-lists which are nearly 2 GB each. It obviously trades the use of your local HD instead of memory but it does chunk the data in memory before flushing it to files on local HD which improves performance massively. You will need to make sure you have enough local HD space = SUM(all word-lists).
Processing files are sorted in: [BaseDirectory]\tmp
Finished file is not sorted properly but are sorted in 256 chunks, 00 to ff based on HEX value of plain-text.
Example:
passwords.txt
Original Size: 184 MB
Converted Size: 169 MB
Duplicates Removed: 28
Run-time: 129 seconds
Obviously we can merge multiple files but I'm just processing 1 file for the above example. Takes awhile but I'm working on improving it. Anyone interested in trying it, please see below:
https://hashkiller.co.uk/downloads/App.Merge.zip [21.4 KB]
Command format:
App.Merge.exe o="output-file.txt" t=4 [options] ... "word-list1.txt" "word-list2.lst" "directory1" ...
For a report analysis of a word-list:
App.Merge.exe r="word-list1.txt"
Double-quotes required for path / file names which contain spaces. You can also specify directory paths if you wish to merge / sort whole directories.
o=[out-file] - Output file.
t=[threads] - Used to speed sorting up only.
c=[mem] - Used to control how much RAM memory to use in MB. Default is 1024. Capped at 3072.
min=[num] - Minimum word length. Default = 1
max=[num] - Maximum word length. Default = 4096.
Words containing control characters will be converted into the Hashcat HEX format: $HEX[...]
is there any chance you could add a -r option that recursively goes through a directory to your app.merge.exe and app.regex.exe programs?(through folders and subfolders)