NEW: We have a Discord server now. Click here to go there now!

NOTE: Why not use our List Manager to crack your lists? Its easy and enables better management.

NOTE: When cracking WPA/WPA2 passwords, make sure you check gpuhash.me first incase it's already been processed.

Home - General Discussion - HUGE word-lists duplicate remover and merge tool


253 Results - Page 6 of 9 -
1 2 3 4 5 6 7 8 9
Author Message
Avatar
Spike188

Status: Trusted
Joined: Mon, 07 Jul 2014
Posts: 613
Team: Biang-Kerox
Reputation: 649 Reputation
Offline
Mon, 02 Mar 2015 @ 11:44:56

i try it too wihout any error

input dir 37,3 GB (40.149.967.444 Bytes)
output file 17,4 GB (18.698.038.923 Bytes)

Merged: f7a.txt
Merge complete to: biglist.txt
Total words : 3771095952
Words skipped: 18
Duplicates removed: 2174306313
$HEX[...] conversions: 280
Total time: 2 hrs 40 mins 22.442 secs


My private Bcoin : 1DzoZ2ksiF8RdjDmbWvDpuxtdDAf8WEbUi
If I found hashes in the paysection please donate to the forum do not send it to my privat bcoin.

Avatar
fonzy35

Status: n/a
Joined: Mon, 10 Feb 2014
Posts: 55
Team:
Reputation: 0 Reputation
Offline
Mon, 02 Mar 2015 @ 13:38:22

good ideal for the $HEX[...] conversion:
I didn't try the App.RegEx app her, yet

I'm going to test a folder of all my words lists(60.5GB) all mix space, no space,
small file to big files (1KB-14.6GB) I'll report back with the result.

Not bad Spike188, it's a great speed/time for a big folder.


Avatar
fonzy35

Status: n/a
Joined: Mon, 10 Feb 2014
Posts: 55
Team:
Reputation: 0 Reputation
Offline
Mon, 02 Mar 2015 @ 13:48:19

Aehash said:

I do know how to stop it, but when I stop it he never finish the cleaning / merging.
tmp folder is not changing size since 330.txt file came to processing, more than 2 hours now, nothing grows, but RAM is constantly from 60 - 100% so probably something is working ?
Is it possible that something in that 303.txt file is messing with processing, maybe strange unicode sign ? It's hard to check with billions lines in file.
I will try to isolate that file and re-run merging for other filse in /tmp folder

You should try it again, it's been fix, working great so far :-)
without the --export-dups


Avatar
fonzy35

Status: n/a
Joined: Mon, 10 Feb 2014
Posts: 55
Team:
Reputation: 0 Reputation
Offline
Mon, 02 Mar 2015 @ 15:55:45

test App.Merge.exe v0.45beta3
input folder(60.5GB)
output file(29GB)
t=8 , c=3072 , --remove-spaces , min=8 max=27


Merged: ffe.txt
Merged: fff.txt
Merge complete to: E:\WordLists\test-output\Sorted_Result-Of_60.5GB_Wordlist_WPA.dict
Total words : 5225898630
Words skipped: 873511866
Duplicates removed: 1825210454
$HEX[...] conversions: 271846
Total time: 2 hrs 0 mins 52.600 secs

Can you imagine if all 8 threads(on all,merging,remove space,remove dups,etc)
working at 75% minimum = Super super fast :-)


Avatar
blandyuk
Admin / Owner
Status: Trusted
Joined: Tue, 05 Jul 2011
Posts: 3214
Team: HashKiller
Reputation: 4165 Reputation
Offline
Mon, 02 Mar 2015 @ 16:08:27

The threads are used on everything apart from initial processing, slicing and the final merge. This is due to disc IO but doesn't really matter too much

That was fast lol


Please read the forum rules | Please read the paid section rules
I accept private hash lists, with forum donations only.
BTC: 1JZGVq58m4RS1QQS8JE5xndzDFy2BvGU6y
GPU Power: 9x GTX 1070 + 6x GTX 1080

Avatar
fonzy35

Status: n/a
Joined: Mon, 10 Feb 2014
Posts: 55
Team:
Reputation: 0 Reputation
Offline
Mon, 02 Mar 2015 @ 16:16:30

blandyuk said:

The threads are used on everything apart from initial processing, slicing and the final merge. This is due to disc IO but doesn't really matter too much

That was fast lol


Fast for sure :-)
I'm splitting it with split in the unix-utility for 20,000,000 line a file,
i'll be able to to view the output file in PilotEdit


Avatar
Waffle

Status: Elite
Joined: Wed, 02 Jan 2013
Posts: 284
Team: CynoSure Prime
Reputation: 357 Reputation
Offline
Mon, 02 Mar 2015 @ 16:31:29

blandyuk said:

The hashcat $HEX[...] format is not new

Not to be pedantic, but it's not the "hashcat $HEX[...] format"

http://hashcat.net/forum/thread-2483.html

I invented it, implemented it in mdxfind, and had been using it for more than a year before hashcat finally started using it :-)


Avatar
blandyuk
Admin / Owner
Status: Trusted
Joined: Tue, 05 Jul 2011
Posts: 3214
Team: HashKiller
Reputation: 4165 Reputation
Offline
Mon, 02 Mar 2015 @ 16:36:38

Your very correct Waffle this is your format which was adopted in hashcat. Good stuff


Please read the forum rules | Please read the paid section rules
I accept private hash lists, with forum donations only.
BTC: 1JZGVq58m4RS1QQS8JE5xndzDFy2BvGU6y
GPU Power: 9x GTX 1070 + 6x GTX 1080

Avatar
fonzy35

Status: n/a
Joined: Mon, 10 Feb 2014
Posts: 55
Team:
Reputation: 0 Reputation
Offline
Tue, 03 Mar 2015 @ 02:27:46

Can you fix the --export-dups?

thanks :-)


Avatar
hm

Status: n/a
Joined: Sun, 05 May 2013
Posts: 42
Team:
Reputation: 28 Reputation
Offline
Sun, 08 Mar 2015 @ 09:17:06

hi,

nut sure ... but I think I could use your tool to merge .pot files, too?


Avatar
ChuckUF4rley

Status: n/a
Joined: Thu, 12 Mar 2015
Posts: 1
Team:
Reputation: 0 Reputation
Offline
Fri, 13 Mar 2015 @ 08:16:36

Merge complete to: GWL.txt
Total words : 2456464587
Words skipped: 41846606
Duplicates removed: 700101309
$HEX[...] conversions: 4583846
Total time: 4 hrs 21 mins 18.161 secs

It consolidated 62.3 GB in 232 Wordlists into one file of 27.4 GB.


Avatar
Lust

Status: Dumper
Joined: Thu, 11 Sep 2014
Posts: 72
Team:
Reputation: 110 Reputation
Offline
Mon, 16 Mar 2015 @ 11:11:44

Is there anyway we can dupe remove multiple wordlists without having to merge them all


2 x R9 290

Avatar
Szul

Status: Elite
Joined: Sat, 15 Sep 2012
Posts: 1567
Team:
Reputation: 1452 Reputation
Offline
Mon, 16 Mar 2015 @ 17:26:28

plz look at ULM. there is tool (cross reference) what allow to remove duplicate phrases from 2 wordlists


Jabber: Szul@jabb.im
4 x GTX 1080
TEAM HASHCAT

Avatar
fonzy35

Status: n/a
Joined: Mon, 10 Feb 2014
Posts: 55
Team:
Reputation: 0 Reputation
Offline
Thu, 19 Mar 2015 @ 01:17:56

AppMere t=8 c=3072
Folder to merge(1,012 wordlist files 88.5GB)

Merge complete to: WordListsAll-Sorted.dict(Single file 47.1GB)
Total words : 7948909757
Words skipped: 878
Duplicates removed: 3843220295
$HEX[...] conversions: 18609231
Total time: 3 hrs 36 mins 23.622 secs :-)

I was wondering about the words skipped on sort only?
Is it word with special characters?


Avatar
fonzy35

Status: n/a
Joined: Mon, 10 Feb 2014
Posts: 55
Team:
Reputation: 0 Reputation
Offline
Fri, 20 Mar 2015 @ 01:52:48

blandyuk , is splitting file still coming soon with APP.Merge?


Thanks


Avatar
blandyuk
Admin / Owner
Status: Trusted
Joined: Tue, 05 Jul 2011
Posts: 3214
Team: HashKiller
Reputation: 4165 Reputation
Offline
Fri, 20 Mar 2015 @ 08:11:30

Skipped words are based on length. See original first post for default min / max lengths. These are dealt with on the initial load.


Please read the forum rules | Please read the paid section rules
I accept private hash lists, with forum donations only.
BTC: 1JZGVq58m4RS1QQS8JE5xndzDFy2BvGU6y
GPU Power: 9x GTX 1070 + 6x GTX 1080

Avatar
blazer

Status: Trusted
Joined: Tue, 23 Aug 2011
Posts: 59
Team:
Reputation: 130 Reputation
Offline
Thu, 16 Apr 2015 @ 14:42:52

hey blandy.

I was working on sort64lm so decided to test against app.merge
I noticed something odd, could be me or due to the design
Anyhow here goes
Sorted some items, here is the output order

App.Merge

!
!!
!"
!#
!.
!1
!?
!@
!e
!k

Sort64lm

!
! !
! !
! donin
! ! ! ! !
! ! aabia ! !
! !!
! &-k)
! '
! ' `

I'm showing the first 10 lines of the output from the same large input file.

App.merge doesn't seem to always sort by ascii order? since space is 0x20 and ! is 0x21, instead it appears to have grouped them by length? However items greater than length 8 appear to be sorted by ascii order. This is a small snippet for first 10 lines after length 8

! donin
! ! ! ! !
! ! aabia ! !
! *%- ,(
! +* -a*(%
! +* -a*(e
! ++(ab(g%a*(e
! ^ _ ? f50jhx
! b +l.- ,(
! keeper

Second odd thing I've noticed is that if the list is heavily skewed towards a certain character app.merge will not de-duplicate them correctly. I believe this is due to the slice process where dupes are spread across the different slices and as a result the de-duplication does not detect them properly. This was verified by post sorting a skewed list after it was processed with app.merge.

Original Lines: 126971372
After App.Merge Lines: 126971372
After Sort64LM: Lines: 63485686

The file was purposely created so that the all the duplicates pairs were all far apart from each other by merging a skewed list with itself.



Avatar
Hash-IT

Status: Trusted
Joined: Tue, 02 Aug 2011
Posts: 4598
Team: HashKiller
Reputation: 2982 Reputation
Offline
Thu, 16 Apr 2015 @ 14:47:30

Thanks for testing and reporting blazer


Please read the forum rules. | Please read the paid section rules.

BTC: 1MmWESN5bKZ1YSuHrm5uNwnQYxWyQnEQ6E

Avatar
blandyuk
Admin / Owner
Status: Trusted
Joined: Tue, 05 Jul 2011
Posts: 3214
Team: HashKiller
Reputation: 4165 Reputation
Offline
Thu, 16 Apr 2015 @ 15:53:27

Yes, it's to do with the slicing process it still works great but there are more updates coming

Cheers for testing.


Please read the forum rules | Please read the paid section rules
I accept private hash lists, with forum donations only.
BTC: 1JZGVq58m4RS1QQS8JE5xndzDFy2BvGU6y
GPU Power: 9x GTX 1070 + 6x GTX 1080

Avatar
fonzy35

Status: n/a
Joined: Mon, 10 Feb 2014
Posts: 55
Team:
Reputation: 0 Reputation
Offline
Fri, 17 Apr 2015 @ 01:32:18

edited.. is the split option coming soon? :-)



Avatar
techno007

Status: n/a
Joined: Sat, 09 May 2015
Posts: 7
Team:
Reputation: 0 Reputation
Offline
Sun, 10 May 2015 @ 08:13:07

how much ram will it take to remove duplicate from 5 GB file ?


Avatar
30k

Status: n/a
Joined: Tue, 12 Aug 2014
Posts: 18
Team:
Reputation: 10 Reputation
Offline
Tue, 23 Jun 2015 @ 16:02:35

You can give it a max mem paremeter and it will not go above that (it dumps and reuses I believe)

Quicky
68GB + 8 GB =>48GB




________________________________________
BTC: 1HUMD5LkAgfZh5PfWwZPeJ1Z4ERuX5ogfh

Avatar
huybk

Status: n/a
Joined: Sun, 09 Aug 2015
Posts: 1
Team:
Reputation: 0 Reputation
Offline
Wed, 26 Aug 2015 @ 15:48:08

need worldlist 32 char HEX hashes


Avatar
chedderslam

Status: n/a
Joined: Sun, 09 Aug 2015
Posts: 6
Team:
Reputation: 0 Reputation
Offline
Sun, 15 Nov 2015 @ 18:33:23

Hey, thanks very much for this tool. I use it a ton and it is very handy.

I am having a problem with it getting stuck on a word list for some reason. The list is just under 3 GB, and it is the last of three in a folder. It always gets stuck at 98.7%. I reduced the limit parameter on the toll that is generating the word list to make it a bit smaller, now it consistently gets stuck at 37.79%. This is the first time I have had this problem. I have let it sit over night.

Any ideas?

Here is the command line I am using:
merge.exe o="4-4.txt" t=4 c=3072 "new"

Thank for any help.


Avatar
mamleader

Status: n/a
Joined: Mon, 09 Nov 2015
Posts: 97
Team:
Reputation: 103 Reputation
Offline
Sun, 22 Nov 2015 @ 12:15:14

Lust said:

Is there anyway we can dupe remove multiple wordlists without having to merge them all

same question is there is way ?

I have 20 files every file is 30 GB I want to remove duplicated words from them all in one without merging them .. Thanks


BTC:1BLrGJed6zxokXn5nqkQzetY82Ryy3WdvV

Avatar
h0wler

Status: Elite
Joined: Tue, 08 Nov 2011
Posts: 309
Team:
Reputation: 260 Reputation
Offline
Sun, 22 Nov 2015 @ 19:08:56


take a look at http://hashcat.net/wiki/doku.php?id=hashcat_utils#rli


mamleader said:

Lust said:

Is there anyway we can dupe remove multiple wordlists without having to merge them all

same question is there is way ?

I have 20 files every file is 30 GB I want to remove duplicated words from them all in one without merging them .. Thanks


BTC: 16c3rG8EwyNXHDKtCWPtediC3NrVUQhu7M

Avatar
mamleader

Status: n/a
Joined: Mon, 09 Nov 2015
Posts: 97
Team:
Reputation: 103 Reputation
Offline
Sun, 22 Nov 2015 @ 20:04:09

thanks h0wler , this is not exactly what I do want because this will lead again to merge the files and this will be problem in huge word-lists .. I want to search in files I make them as "Input" and search & delete or extract them one file and can be cleaned easy later.


BTC:1BLrGJed6zxokXn5nqkQzetY82Ryy3WdvV

Avatar
12monkeys

Status: n/a
Joined: Thu, 05 Nov 2015
Posts: 72
Team:
Reputation: 10 Reputation
Offline
Mon, 30 Nov 2015 @ 21:54:36

Hi,
I tried to merge at once 144 dictionaries which i found so far. Alltogether more than 300gb. After 24hrs and couple errors about memory I had terminate whole process when app was slicing file 20in 1000... My mouse was lagging very much. Oclhashcat was showing speed 0 and I could barely switch between any windows...
Today I started one more time.
file1= weakpass 39gb
file2=g0tmi1k-wordlist 38gb
Total words 8638384690
Words skipped 4792233362
Duplicates removed 601443509
Time3hrs6min
output = 39+gb
crazy...
Now I'm going to add just 1 big file at a time.
Oh, I merged file with min=8 and max=20 which is ok right?


write something positive, mad.

Avatar
12monkeys

Status: n/a
Joined: Thu, 05 Nov 2015
Posts: 72
Team:
Reputation: 10 Reputation
Offline
Tue, 01 Dec 2015 @ 01:35:21

Something is wrong
I merged output file(39.3gb) from previous post with next big dictionary which is acdc(28gb) and here are results:
output_new =39.172gb
total words 7346433955
words skipped 4104109329
duplicates removed 917403
total time 2hr 44mins
SO it seems like acdc wasnt added at all but new output file is not even smaller than output_old. Why?


write something positive, mad.

Avatar
blandyuk
Admin / Owner
Status: Trusted
Joined: Tue, 05 Jul 2011
Posts: 3214
Team: HashKiller
Reputation: 4165 Reputation
Offline
Thu, 03 Dec 2015 @ 17:23:40

Not sure what you have done there. I have released a new version which uses a super fash hash algo for even distribution when processing

Get v0.46 below:

https://hashkiller.co.uk/downloads/App.Merge.zip


Please read the forum rules | Please read the paid section rules
I accept private hash lists, with forum donations only.
BTC: 1JZGVq58m4RS1QQS8JE5xndzDFy2BvGU6y
GPU Power: 9x GTX 1070 + 6x GTX 1080


253 Results - Page 6 of 9 -
1 2 3 4 5 6 7 8 9

We have a total of 197535 messages in 24435 topics.
We have a total of 21708 registered users.
Our newest registered member is dhiafd711.