tekCollect (formerly hashCollect)

Description: tekCollect started off as a tool to scrape md5 hashes from specified files and URLs. As further development occurred I realized that the program would serve more functional if it could pull out other data types besides MD5s, such as IP Addresses, URLs, SSNs, and more. With that in mind I modified the code to have some default searches such as the ones mentioned above. Additionally I added the abaility to search based on the users own custom regex. 

There is much more planned for this tool. Expect to see database integration, more data types, and maybe even integration with other tools.

Current Versions is .4


As this is a python script you will need to ensure you have the correct version of python, which for this script is python 2.7. I used mostly standard libraries, but just incase you don't have them, here are the libraries that are required: httplib2, re, sys, argparse

With the python and the libraries out of the way, you can simply use git to clone the tekdefense code to your local machine.

git clone https://github.com/1aN0rmus/TekDefense.git

If you don't have git installed you can simply download the script from https://github.com/1aN0rmus/TekDefense/blob/master/tekCollect.py

On linux, if you would like to run this as an executable (./) be sure to:

chmod +x tekCollect.py 


Like always let's start off with the help command:

root@bt:~/workspace/Automater# ./hashCollect.py -h
usage: hashCollect.py [-h] [-u URL] [-f FILE] [-o OUTPUT] [-r REGEX] [-t TYPE]
tekCollect is a tool that will scrape a file or website for specified data
optional arguments:
  -h, --help            show this help message and exit
  -u URL, --url URL     This option is used to search for hashes on a website
  -f FILE, --file FILE  This option is used to import a file that contains
  -o OUTPUT, --output OUTPUT
                        This option will output the results to a file.
  -r REGEX, --regex REGEX
                        This option allows the user to set a custom regex
                        value. Must incase in single or double quotes.
  -t TYPE, --type TYPE  This option allows a user to choose the type of data
                        they want to pull out. Currently MD5, SHA1, SHA 256,
                        Domain, URL, IP4, IP6, CCN, SSN, EMAIL
  -s, --Summary         This options will show a summary of the data types in
                        a file

From the help command you will notice we have a few options when running this program. The only required options are that you must have a file (-f) or a URL (-u). If no data type (-t) is given, the program assumes that you want to find MD5 Hasshes.

To show you typical usage here are a few examples:

Search a file for MD5 Hashes

root@bt:~/workspace/Automater# ./tekCollect.py -f mixfile -t MD5



















Search a URL for IP Addresses

root@bt:~/workspace/Automater# ./tekCollect.py -u http://minotauranalysis.com/malwarelist.aspx -t IP4

Search a URL for Email Addresses and output to a file

root@bt:~/workspace/Automater# ./tekCollect.py -u http://www.TekDefense.com/ -t EMAIL -o TekEmails.out

[+] Printing results to file: TekEmails.out

root@bt:~/workspace/Automater# cat TekEmails.out 


Show a summary of the different types of data at a URL

root@bt:~/workspace/Automater# ./tekCollect.py -u http://www.Securabit.com/ -s

# of MD5 in the target: 0

# of SHA1 in the target: 0

# of SHA256 in the target: 0

# of DOMAIN in the target: 64

# of URL in the target: 20

# of IP4 in the target: 0

# of IP6 in the target: 0

# of SSN in the target: 0

# of EMAIL in the target: 0

# of CCN in the target: 6

Show a summary of the different types of data in a file

root@bt:~/workspace/Automater# ./tekCollect.py -f mixfile -s

# of MD5 in the target: 63

# of SHA1 in the target: 0

# of SHA256 in the target: 0

# of DOMAIN in the target: 48

# of URL in the target: 5

# of IP4 in the target: 2

# of IP6 in the target: 2

# of SSN in the target: 3

# of EMAIL in the target: 36

# of CCN in the target: 17

If you have any suggestions for the tool please let me know.