Home - General Discussion - HUGE word-lists duplicate remover and merge tool


215 Results - Page 4 of 8 -
1 2 3 4 5 6 7 8
Author Message
Avatar
fonzy35

Status: n/a
Joined: Mon, 10 Feb 2014
Posts: 55
Team:
Reputation: 0 Reputation
Offline
Mon, 23 Feb 2015 @ 16:17:43


Merge complete to: E:\WordLists\cFolderMerge.dict
Total words : 1185286617
Words skipped: 0
Duplicates removed: 121051652
Total time: 2 hrs 17 mins 37.498 secs

i just did merge and remove the duplicate from the 2 file a8-64.dict b8-64.dict = cFolderMerge.dict final (13.8GB)

i had already done the the 2 file separately before for for each file duplicate and min=8 max-64 -t 8 that give me a8.64.dict (3.49GB)
and another b8-64.dict (11.7GB)

Intel i - 3770k @ clock 3.7 4 cores (8 threads)

So should i use for the -t 4
or -t 8 ??

i figure to use the thread number 8 except not the core 4

Is there a program that will remove the empty space between 2+ words in dictionary.dict?
ex.:
Last Call At The Bar
to
LastCallAtTheBar

Thanks for this utility , it's very appriciated


Avatar
blandyuk
Admin / Owner
Status: Trusted
Joined: Tue, 05 Jul 2011
Posts: 2916
Team: HashKiller
Reputation: 3911 Reputation
Offline
Mon, 23 Feb 2015 @ 16:41:14

I can add functionality to remove spaces

Also, there is another command which regulates memory / RAM usage. Default is 1GB but it can be changed to a max of 4GB.

c=[mem] - Memory / RAM to use in MB. Default is 1024 MB.

Just added removal of spaces:

--remove-spaces - Removes spaces from words in word-lists.

Download latest and try it...

http://home.btconnect.com/md5decrypter/App.Merge.zip


Please read the forum rules | Please read the paid section rules
I accept private hash lists, with forum donations only.
BTC: 15qF9WUeFUD63ishxyAMiEgGqTcYzk4j9b
GPU Power: 7x GeForce GTX 1070 and My Brain

Avatar
fonzy35

Status: n/a
Joined: Mon, 10 Feb 2014
Posts: 55
Team:
Reputation: 0 Reputation
Offline
Mon, 23 Feb 2015 @ 21:02:15


Thanks for the extra function :-)


ok , i get this:

App.Merge.exe o="E:\WordLists\dFolderMerge.dict" t=8 --remove-spaces c=4096 "E:\WordLists\cFolderMerge.dict"
Merge Tool by BlandyUK v0.43

Input files / dirs: 1
Combined filesize: 14883248388
Output file: E:\WordLists\dFolderMerge.dict

i cancel it, i'll try a directory instead

App.Merge.exe o="E:\WordLists\dFolderMerge.dict" t=8 --remove-spaces c=4096 "E:\WordLists\a"

Input files / dirs: 1
Combined filesize: 14883248388
Output file: E:\WordLists\dFolderMerge.dict


Avatar
fonzy35

Status: n/a
Joined: Mon, 10 Feb 2014
Posts: 55
Team:
Reputation: 0 Reputation
Offline
Mon, 23 Feb 2015 @ 21:58:27

fonzy35 said:


Thanks for the extra function :-)


ok , i get this:

App.Merge.exe o="E:\WordLists\dFolderMerge.dict" t=8 --remove-spaces c=4096 "E:\WordLists\cFolderMerge.dict"
Merge Tool by BlandyUK v0.43

Input files / dirs: 1
Combined filesize: 14883248388
Output file: E:\WordLists\dFolderMerge.dict

i cancel it, i'll try a directory instead

App.Merge.exe o="E:\WordLists\dFolderMerge.dict" t=8 --remove-spaces c=4096 "E:\WordLists\a"

Input files / dirs: 1
Combined filesize: 14883248388
Output file: E:\WordLists\dFolderMerge.dict

it's stay there, doesn't do anything
maybe i'm suppose to add something more in the function?

TIA


Avatar
blandyuk
Admin / Owner
Status: Trusted
Joined: Tue, 05 Jul 2011
Posts: 2916
Team: HashKiller
Reputation: 3911 Reputation
Offline
Mon, 23 Feb 2015 @ 22:20:59

I'm actually working on this again making the sort even faster!

Noticed at issue with mem size so gimmi 5 mins...

UPDATE: OK, download it again it's faster now...


Please read the forum rules | Please read the paid section rules
I accept private hash lists, with forum donations only.
BTC: 15qF9WUeFUD63ishxyAMiEgGqTcYzk4j9b
GPU Power: 7x GeForce GTX 1070 and My Brain

Avatar
blandyuk
Admin / Owner
Status: Trusted
Joined: Tue, 05 Jul 2011
Posts: 2916
Team: HashKiller
Reputation: 3911 Reputation
Offline
Mon, 23 Feb 2015 @ 22:35:13

OK, just run a directory of 90 files with a total space of 1.92GB in:

Code:
Words skipped: 14
Duplicates removed: 79233619
Total time: 0 hrs 4 mins 19.154 secs

Noticed a small bug as it's not counting total words but minor issue which I'll resolve in next version. Also noticed it's missing some dups so fixing that now...

NOTE: Please don't rely on this tool yet as it's still in BETA.


Please read the forum rules | Please read the paid section rules
I accept private hash lists, with forum donations only.
BTC: 15qF9WUeFUD63ishxyAMiEgGqTcYzk4j9b
GPU Power: 7x GeForce GTX 1070 and My Brain

Avatar
blandyuk
Admin / Owner
Status: Trusted
Joined: Tue, 05 Jul 2011
Posts: 2916
Team: HashKiller
Reputation: 3911 Reputation
Offline
Mon, 23 Feb 2015 @ 23:06:53

OK all sorted it's good to go. Version is v0.44beta1. Please test...


Please read the forum rules | Please read the paid section rules
I accept private hash lists, with forum donations only.
BTC: 15qF9WUeFUD63ishxyAMiEgGqTcYzk4j9b
GPU Power: 7x GeForce GTX 1070 and My Brain

Avatar
fonzy35

Status: n/a
Joined: Mon, 10 Feb 2014
Posts: 55
Team:
Reputation: 0 Reputation
Offline
Mon, 23 Feb 2015 @ 23:18:38

App.Merge.exe o="E:\WordLists\aaFolderMerge.dict" t=8 --remove-spaces "E:\WordLists\aFolderMerge.dict"
Merge Tool by BlandyUK v0.44beta0

Input files / dirs: 1
Combined filesize: 3753510486
Output file: E:\WordLists\aaFolderMerge.dict

with the --remove-spaces
it stop there

same with memory

What kind of software you'r using to code App.Merge.exe?

it's look very complicated to program, would be interesting to learn someday :-)


Avatar
fonzy35

Status: n/a
Joined: Mon, 10 Feb 2014
Posts: 55
Team:
Reputation: 0 Reputation
Offline
Mon, 23 Feb 2015 @ 23:24:52

App.Mereg.exe v0.44beta1


App.Merge.exe o="E:\WordLists\aaFolderMerge.dict" t=8 --remove-spaces "E:\WordLists\aFolderMerge.dict"
Merge Tool by BlandyUK v0.44beta1

Input files / dirs: 1
Combined filesize: 3753510486
Output file: E:\WordLists\aaFolderMerge.dict

it stay there


Avatar
blandyuk
Admin / Owner
Status: Trusted
Joined: Tue, 05 Jul 2011
Posts: 2916
Team: HashKiller
Reputation: 3911 Reputation
Offline
Mon, 23 Feb 2015 @ 23:28:08

Ah, it was a bug with the new --remove-spaces feature. Fixed so download again, v0.44beta2.


Please read the forum rules | Please read the paid section rules
I accept private hash lists, with forum donations only.
BTC: 15qF9WUeFUD63ishxyAMiEgGqTcYzk4j9b
GPU Power: 7x GeForce GTX 1070 and My Brain

Avatar
fonzy35

Status: n/a
Joined: Mon, 10 Feb 2014
Posts: 55
Team:
Reputation: 0 Reputation
Offline
Tue, 24 Feb 2015 @ 00:55:02

App.Merge.exe v0.44beta2

App.Merge.exe o="E:\WordLists\aaFolderMerge.dict" t=8 --remove-spaces "E:\WordLists\aFolderMerge.dict"

aFolderMerge.dict (3,753,510,486 bytes file)

code:

Code:
Merged: 7ac.txt
Merged: 7ad.txt
Merged: 7ae.txt
Merged: 7af.txt
Merge complete to: E:\WordLists\aaFolderMerge.dict
Total words  : 261249704
Words skipped: 0
Duplicates removed: 154901
Total time: 0 hrs 4 mins 35.316 secs

it remove some duplicate after the blank space was removed, I think
aaFolderMerge.dict (3,750,993,036 bytes files)

Nice work... Thanks :-)

PS I didn't use the c=4096 for memory because it was staying still


Avatar
fonzy35

Status: n/a
Joined: Mon, 10 Feb 2014
Posts: 55
Team:
Reputation: 0 Reputation
Offline
Tue, 24 Feb 2015 @ 01:25:49

App.Merge.exe v0.44beta2

i did another test with 215 line in the file testspacetext.txt of double name with space in between
and:


E:\HashcatGUI_044\unix-utils>App.Merge.exe o="E:\WordLists\mergedtestspacetext.txt" t=8 --remove-spaces "E:\WordLists\testspacetext.txt"
Merge Tool by BlandyUK v0.44beta2

code
-------
Input files / dirs: 1
Combined filesize: 3367
Output file: E:\WordLists\mergedtestspacetext.txt
Position: 100.00 % of testspacetext.txt
- Words: 215 ~ Skipped: 0 ~ Mem: 0 MB
File: 412.txt ~ Sort Time: 8.70 m/s ~ Duplicates: 0
File: 416.txt ~ Sort Time: 2.52 m/s ~ Duplicates: 0
Merged: 412.txt
Merged: 416.txt
Merge complete to: E:\WordLists\mergedtestspacetext.txt
Total words : 215
Words skipped: 0
Duplicates removed: 0
Total time: 0 hrs 0 mins 0.65 secs


-------
the first file with space testspacetext.txt (3,367 bytes) 215 line words
the second without space mergedtestspacetext.txt (2,913 bytes) 215 line words

bytes from the space was reduce after remove with App.Merge

Work Perfecto, Nice work and Thanks

I'm going to try it on a 13.8GB file with --remove-space


Avatar
fonzy35

Status: n/a
Joined: Mon, 10 Feb 2014
Posts: 55
Team:
Reputation: 0 Reputation
Offline
Tue, 24 Feb 2015 @ 14:03:26


works with c=1024 , c=2048, c=3072
but don't work with c=4096


Avatar
blandyuk
Admin / Owner
Status: Trusted
Joined: Tue, 05 Jul 2011
Posts: 2916
Team: HashKiller
Reputation: 3911 Reputation
Offline
Tue, 24 Feb 2015 @ 14:08:15

OK, 3072MB is more than enough anyway so np there. When reading, it's single-threaded so will be a thread limit. When sorting its multi-threaded so does not have this issue.


Please read the forum rules | Please read the paid section rules
I accept private hash lists, with forum donations only.
BTC: 15qF9WUeFUD63ishxyAMiEgGqTcYzk4j9b
GPU Power: 7x GeForce GTX 1070 and My Brain

Avatar
Aehash

Status: n/a
Joined: Tue, 24 Feb 2015
Posts: 5
Team:
Reputation: 0 Reputation
Offline
Tue, 24 Feb 2015 @ 14:14:30

I am merging some of my lists and after some time process stops and last line is always "Slicing: 303.txt" and just standing like that, running memory up to limits but not working word processing, removing duplicates, etc

I use windows 7 64bit, cpu 4-core with hypertrading, with 8 Gb RAM, if that matter for some troubleshooting.


Avatar
fonzy35

Status: n/a
Joined: Mon, 10 Feb 2014
Posts: 55
Team:
Reputation: 0 Reputation
Offline
Tue, 24 Feb 2015 @ 14:23:18

ok, thanks blandyuk


Avatar
blandyuk
Admin / Owner
Status: Trusted
Joined: Tue, 05 Jul 2011
Posts: 2916
Team: HashKiller
Reputation: 3911 Reputation
Offline
Tue, 24 Feb 2015 @ 23:10:00

OK, I've ironed out another bug where it was missing some duplicates but all sorted now. Should be slightly faster now as I've added better file management in.

Latest version is v0.45beta0 so download and test if you please...

http://home.btconnect.com/md5decrypter/App.Merge.zip


Please read the forum rules | Please read the paid section rules
I accept private hash lists, with forum donations only.
BTC: 15qF9WUeFUD63ishxyAMiEgGqTcYzk4j9b
GPU Power: 7x GeForce GTX 1070 and My Brain

Avatar
Aehash

Status: n/a
Joined: Tue, 24 Feb 2015
Posts: 5
Team:
Reputation: 0 Reputation
Offline
Wed, 25 Feb 2015 @ 00:30:54

Do you have any tip how to avoid "Slicing: 303.txt" to stop all the process of merging and cleaning lists ?
I have split bigger lists on smaller parts in case that is something with memory or cpu but my pc is handling everything just fine, even merging 13 Gb and 7 Gb is going smooth, but in some parts I can not avoid that "303.txt" stop,
and that file is significantly bigger, about 1,6 Gb , comparing to few Kb to 500 Mb of other text parts in /tmp folder.

http://imgur.com/zVEesNZ" border="0" alt="" />

http://imgur.com/zVEesNZ


Avatar
fonzy35

Status: n/a
Joined: Mon, 10 Feb 2014
Posts: 55
Team:
Reputation: 0 Reputation
Offline
Wed, 25 Feb 2015 @ 00:53:05

Aehash said:

Do you have any tip how to avoid "Slicing: 303.txt" to stop all the process of merging and cleaning lists ?
I have split bigger lists on smaller parts in case that is something with memory or cpu but my pc is handling everything just fine, even merging 13 Gb and 7 Gb is going smooth, but in some parts I can not avoid that "303.txt" stop,
and that file is significantly bigger, about 1,6 Gb , comparing to few Kb to 500 Mb of other text parts in /tmp folder.

http://imgur.com/zVEesNZ" border="0" alt="" />

http://imgur.com/zVEesNZ

it did that to me too before and came back after a while... if you watch the temp folder you'll see growing in size
but i didn't wait hours just 5-6 minutes top
if you click ctrl and c it will stop and not delete the temp folder


if it run normally, after its done, it will empty the temp folder


Avatar
Aehash

Status: n/a
Joined: Tue, 24 Feb 2015
Posts: 5
Team:
Reputation: 0 Reputation
Offline
Wed, 25 Feb 2015 @ 01:14:56

I do know how to stop it, but when I stop it he never finish the cleaning / merging.
tmp folder is not changing size since 330.txt file came to processing, more than 2 hours now, nothing grows, but RAM is constantly from 60 - 100% so probably something is working ?
Is it possible that something in that 303.txt file is messing with processing, maybe strange unicode sign ? It's hard to check with billions lines in file.
I will try to isolate that file and re-run merging for other filse in /tmp folder


Avatar
fonzy35

Status: n/a
Joined: Mon, 10 Feb 2014
Posts: 55
Team:
Reputation: 0 Reputation
Offline
Wed, 25 Feb 2015 @ 01:23:47

Aehash said:

I do know how to stop it, but when I stop it he never finish the cleaning / merging.
tmp folder is not changing size since 330.txt file came to processing, more than 2 hours now, nothing grows, but RAM is constantly from 60 - 100% so probably something is working ?
Is it possible that something in that 303.txt file is messing with processing, maybe strange unicode sign ? It's hard to check with billions lines in file.
I will try to isolate that file and re-run merging for other filse in /tmp folder

umm, look like you got a problem for sure



Avatar
fonzy35

Status: n/a
Joined: Mon, 10 Feb 2014
Posts: 55
Team:
Reputation: 0 Reputation
Offline
Wed, 25 Feb 2015 @ 03:01:31

ok i tried v0.45beta0
and 10,000 word list with space (185MB)


t=8 c=3072 --remove-space

Merge complete to: E:\WordLists\SortNoBlankSpace10-million-combos.txt
Total words : 10000001
Words skipped: 1
Duplicates removed: 0
Total time: 0 hrs 0 mins 20.389 secs


but inside the complete file:(all in HEX)

c696e]
$HEX[0335676f6f6c706869630973696d73696d0332]
$HEX[1031433430313809676f6c666572316334623670376b]
$HEX[21343536363534096a75616e6173]
$HEX[21343330377175616e74097175616e74343330373939]
$HEX[21343439303131096d6b6a68797472]
$HEX[2437336452790942714a5744]
$HEX[2432615f77327309327364653463]
$HEX[2437313338343434097332303733393131]
$HEX[2432303033496e6469610966697368696e67]
$HEX[2432666f72686f6c6c7909626f6f6b6769726c7364]
$HEX[24346d615a59095a4879584556]



Avatar
Aehash

Status: n/a
Joined: Tue, 24 Feb 2015
Posts: 5
Team:
Reputation: 0 Reputation
Offline
Wed, 25 Feb 2015 @ 04:17:07

I tried to join 2 files (13 Gb and 15 Gb) and final result got 5 Gb which is absolutely impossible to get so much duplicates, they were already cleaned various times and on various ways (including spliting files in smaller parts and cleaning in "Once is enough" and this proggy)


Avatar
blandyuk
Admin / Owner
Status: Trusted
Joined: Tue, 05 Jul 2011
Posts: 2916
Team: HashKiller
Reputation: 3911 Reputation
Offline
Wed, 25 Feb 2015 @ 08:53:18

Yes, seems I've bugged it somewhere else now. Leave it with me...


Please read the forum rules | Please read the paid section rules
I accept private hash lists, with forum donations only.
BTC: 15qF9WUeFUD63ishxyAMiEgGqTcYzk4j9b
GPU Power: 7x GeForce GTX 1070 and My Brain

Avatar
fonzy35

Status: n/a
Joined: Mon, 10 Feb 2014
Posts: 55
Team:
Reputation: 0 Reputation
Offline
Wed, 25 Feb 2015 @ 15:46:18

ok, the v0.45beta0
and 10,000 word list with space (185MB) that i tried before was not with space, my mistake it was with Tap Spaces(4spaces)
App.Merge.exe -- remove-spaces

and it give me the file in HEX


v0.45beta0
I did back a test with a file with words with one space between words
App.Merge.exe --remove-spaces

and it came out right


I thought the --remove-spaces was to remove all space (even Tap space)


v0.45beta0 works good with many spaces (but with Tap Space come out in hex)


PS can you add to App.Merge.exe --remove-tap-spaces ? :-)


Avatar
blandyuk
Admin / Owner
Status: Trusted
Joined: Tue, 05 Jul 2011
Posts: 2916
Team: HashKiller
Reputation: 3911 Reputation
Offline
Wed, 25 Feb 2015 @ 22:16:50

OK, just released v0.45beta1 which has a bug fix on the slicing tool which was causing the output sizes to be small / wrong.

Not sure what you mean by tap spaces?

Initial tests are good looks like I'll be able to release this soon but I'm open to adding more tools like splitting word-lists into lengths, etc.

Would be nice to see a comparison against other word-list tools with regards to speed based on small / large and VERY large word-lists.


Please read the forum rules | Please read the paid section rules
I accept private hash lists, with forum donations only.
BTC: 15qF9WUeFUD63ishxyAMiEgGqTcYzk4j9b
GPU Power: 7x GeForce GTX 1070 and My Brain

Avatar
blandyuk
Admin / Owner
Status: Trusted
Joined: Tue, 05 Jul 2011
Posts: 2916
Team: HashKiller
Reputation: 3911 Reputation
Offline
Wed, 25 Feb 2015 @ 22:44:46

OK merged 2 word-lists: 2.83GB and 2.55GB

Code:
Merge complete to: C:\Temp\hashes-org.txt
Total words  : 543741374
Words skipped: 1
Duplicates removed: 249136620
Total time: 0 hrs 13 mins 43.81 secs



Please read the forum rules | Please read the paid section rules
I accept private hash lists, with forum donations only.
BTC: 15qF9WUeFUD63ishxyAMiEgGqTcYzk4j9b
GPU Power: 7x GeForce GTX 1070 and My Brain

Avatar
Aehash

Status: n/a
Joined: Tue, 24 Feb 2015
Posts: 5
Team:
Reputation: 0 Reputation
Offline
Wed, 25 Feb 2015 @ 23:47:58

splitting word-lists into lengths is excellent idea, to separate everything bellow 8 chars from the lists but yet to leave them in another file for some other tests.

by "tap spaces" he think on Tab spacing, above Caps Lock on keyboard.



Avatar
fonzy35

Status: n/a
Joined: Mon, 10 Feb 2014
Posts: 55
Team:
Reputation: 0 Reputation
Offline
Thu, 26 Feb 2015 @ 00:20:41

Tap space, i meant to say Tab Key on the key board, the space is bigger..

ok, test App.Merge.exe v0.45beta1

file authors_nopunct_lower.txt(157KB) --remove-spaces t=8 c=3072
no problem


the other got stock on:
file 10-million-combos.txt(189MB) --remove-spaces t=8 c=3072


- Words: 9942332 ~ Skipped: 1 ~ Mem: 920 MB
Position: 100.00 % of 10-million-combos.txt
- Words: 10000001 ~ Skipped: 1 ~ Mem: 943 MB
Slicing: 24.txt
- Slice Time: 5685.68 m/s
File: 6d.txt ~ Sort Time: 15.90 m/s ~ Duplicates: 0
File: 73.txt ~ Sort Time: 15.95 m/s ~ Duplicates: 0
^C
E:\HashcatGUI_044\unix-utils>


Had to Ctrl + c to stop it

try next crackstation file realuniq.lst(14.6GB)


Avatar
fonzy35

Status: n/a
Joined: Mon, 10 Feb 2014
Posts: 55
Team:
Reputation: 0 Reputation
Offline
Thu, 26 Feb 2015 @ 02:19:22

test v0.45beta1
file: realuniq.lst (14.6GB)
output file: AppMergeSorted-realuniq.lst(4.42MB)
t=8 c=3072 min=8 max=64 --remove-spaces

code:

Merged: fc.txt
Merged: fd.txt
Merged: fe.txt
Merged: ff.txt
Merge complete to: E:\WordLists\crackstation\AppMergeSorted-realuniq.lst
Total words : 194294
Words skipped: 127732
Duplicates removed: 693
Total time: 0 hrs 0 mins 5.373 secs

umm, something went wrong



215 Results - Page 4 of 8 -
1 2 3 4 5 6 7 8

We have a total of 148426 messages in 18357 topics.
We have a total of 18219 registered users.
Our newest registered member is OrlandoX.