Sponsor

Security Videos
Monday
Jun012015

BSidesNola 2015 Presentation on Honeypots

Wow, it has been a long time since I have posted. I plan to rectify my posting frequency problems, starting now. Last weekend @p4r4n0y1ng and I (@TekDefense) gave a presentation on Honeypots called "Catch More Honeys when you are fly" at BSidesNola. See the slides below:

I will be publishing a more detailed article on SSHPsychos soon!

Sunday
Jul202014

Over a year with Kippo

UPDATE: After posting @ikoniaris of Honeydrive and Bruteforce fame recommended running these. Here are the results of kippo-stats.pl created by Tomasz Miklas and Miguel jacq.

As many of you know from previous posts, I am a big fan of honeypots, particularly Kippo. My main Kippo instance sitting in AWS has been online for over a year now. Let's take a look at what we have captured and learned over this past year. If you want to validate any of these statistics I have made the raw logs available for download.

General Stats:

Unique values (135526 connections):

*csv with geo location

*Map Generated with JCSOCAL's GIPC

Top 11 Countries

China: 699

United States: 654

Brazil: 76

Russian Federation: 69

Germany: 65

Korea, Republic of: 57

Romania: 56

Egypt: 52

Japan: 50

India: 41

Indonesia: 41

Unique Usernames: 8600 (Username list)

 Unique Passwords: 75780 (wordlist)

Unique Sources: 1985 (list of IPs)

Passwords:

One of my favorite uses of kippo data is to generate wordlists from login attempts. I wrote a quick script to parse the kippo logs and pull out all passwords and unique them into a wordlist. Feel free to grab. Additionally I made the wordlists available for download.

Using Pipal I performed analysis of all the login attempts over this year:

Two items of note here are that over 60% of password attempts were 1-8 characters. 40% of attempts were for lowercase alpha characters only. The most used password was 123456. This is the default pass for Kippo.

If a user attempts to create an account or change the root password in a Kippo session those passwords are captured and added to the allowed credentials list. The following credentials were created:

root:0:albertinoalbert123
root:0:fgashyeq77dhshfa
root:0:florian12eu
root:0:hgd177q891999wwwwwe1.dON
root:0:iphone5
root:0:kokot
root:0:nope
root:0:picvina
root:0:scorpi123
root:0:test
root:0:xiaozhe
root:0:12345
root:0:bnn318da9031kdamfaihheq1fa
root:0:ls
root:0:neonhostt1
root:0:wget123

Downloads:

When an attacker attempts to download a tool via wget, within Kippo we allow that file to be downloaded, although they cannot interact with it. With this we are able to get a copy of whatever is being downloaded. In most cases these are IRC bots, but not all. I have made them all available for download.
Here is a listing of all the files:
*Duplicates and obviously legitimate files have been removed from the list.
20131030113401_http___198_2_192_204_22_disknyp
20131103183232_http___61_132_227_111_8080_meimei
20131104045744_http___198_2_192_204_22_disknyp
20131114214017_http___www_unrealircd_com_downloads_Unreal3_2_8_1_tar_gz
20131116130541_http___198_2_192_204_22_disknyp
20131129165151_http___dl_dropboxusercontent_com_s_1bxj9ak8m1octmk_ktx_c
20131129165438_http___dl_dropboxusercontent_com_s_66gpt66lvut4gdu_ktx
20131202040921_http___198_2_192_204_22_disknyp
20131207123419_http___packetstorm_wowhacker_com_DoS_juno_c
20131216143108_http___www_psybnc_at_download_beta_psyBNC_2_3_2_7_tar_gz
20131216143208_http___X_hackersoft_org_scanner_gosh_jpg
20131216143226_http___download_microsoft_com_download_win2000platform_SP_SP3_NT5_EN_US_W2Ksp3_exe
20131217163423_http___ha_ckers_org_slowloris_slowloris_pl
20131217163456_http___www_lemarinel_net_perl
20131222084315_http___maxhub_com_auto_bill_pipe_bot
20140103142644_http___ftp_gnu_org_gnu_autoconf_autoconf_2_69_tar_gz
20140109170001_http___sourceforge_net_projects_cpuminer_files_pooler_cpuminer_2_3_2_linux_x86_tar_gz
20140120152204_http___111_39_43_54_5555_dos32
20140122202342_http___layer1_cpanel_net_latest
20140122202549_http___linux_duke_edu_projects_yum_download_2_0_yum_2_0_7_tar_gz
20140122202751_http___www_ehcp_net_ehcp_latest_tgz
20140201131804_http___www_suplementar_com_br_images_stories_goon_pooler_cpuminer_2_3_2_tar_gz
20140201152307_http___nemo_rdsor_ro_darwin_jpg
20140208081358_http___www_youtube_com_watch_v_6hVQs5ll064
20140208184835_http___sharplase_ru_x_txt
20140215141909_http___tenet_dl_sourceforge_net_project_cpuminer_pooler_cpuminer_2_3_2_tar_gz
20140215142830_http___sourceforge_net_projects_cpuminer_files_pooler_cpuminer_2_3_2_tar_gz
20140219072721_http___www_psybnc_at_download_beta_psyBNC_2_3_2_7_tar_gz
20140328031725_http___dl_dropboxusercontent_com_u_133538399_multi_py
20140409053322_http___www_c99php_com_shell_c99_rar
20140409053728_http___github_com_downloads_orbweb_PHP_SHELL_WSO_wso2_5_1_php
20140413130110_http___www_iphobos_com_hb_unixcod_rar
20140416194008_http___linux_help_bugs3_com_Camel_mail_txt
20140419143734_http___www_activestate_com_activeperl_downloads_thank_you_dl_http___downloads_activestate_com_ActivePerl_releases_5_18_2_1802_ActivePerl_5_18_2_1802_x86_64_linux_glibc_2_5_298023_tar_gz
20140419144043_http___ha_ckers_org_slowloris_slowloris_pl
20140420104056_http___downloads_metasploit_com_data_releases_archive_metasploit_4_9_2_linux_x64_installer_run
20140420104325_http___nmap_org_dist_nmap_6_46_1_i386_rpm
20140505073503_http___116_255_239_180_888_007
20140505093229_http___119_148_161_25_805_sd32
20140505111511_http___112_117_223_10_280_1
20140515091557_http___112_117_223_10_280__bash_6_phpmysql
20140519193800_http___www_unrealircd_com_downloads_Unreal3_2_8_1_tar_gz
20140523120411_http___lemonjuice_tk_netcat_sh
20140610174516_http___59_63_183_193_280__etc_Test8888
20140614200901_http___kismetismy_name_ktx
20140625032113_http___ftp_mirrorservice_org_sites_ftp_wiretapped_net_pub_security_packet_construction_netcat_gnu_netcat_netcat_0_7_1_tar_gz
20140720005010_http___www_bl4ck_viper_persiangig_com_p8_localroots_2_6_x_cw7_3
To see the full source for some of the scripts downloaded by the attackers you can go to this Github Repo. A couple of my favorite ones.

TTY Replay Sessions:

My absolute favorite feature of Kippo is the ability to replay interactive sessions of attacker activity. Watching these replays gives us an idea of what attackers do once inside a session. For instance almost every session begins with a "w" which shows logged in users and uptime, and then a "uname -a" to show them system details. I made a Youtube series called The Kippo Kronicles a while back to showcase some of these sessions. While I don't have the time necessary to continue putting up videos for each session I have put the output of each session up at this Github Repo.
Here is a fun example:
AWSWeb:~# adduser
adduser: Only one or two names allowed.
AWSWeb:~# useradd
adduser: Only one or two names allowed.
AWSWeb:~# ls
AWSWeb:~# pwd
root
AWSWeb:~# cd /cat /etc/passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
lp:x:7:7:lp:/var/spool/lpd:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
uucp:x:10:10:uucp:/var/spool/uucp:/bin/sh
proxy:x:13:13:proxy:/bin:/bin/sh
www-data:x:33:33:www-data:/var/www:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh
list:x:38:38:Mailing List Manager:/var/list:/bin/sh
irc:x:39:39:ircd:/var/run/ircd:/bin/sh
gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/bin/sh
nobody:x:65534:65534:nobody:/nonexistent:/bin/sh
libuuid:x:100:101::/var/lib/libuuid:/bin/sh
richard:x:1000:1000:richard,,,:/home/richard:/bin/bash
sshd:x:101:65534::/var/run/sshd:/usr/sbin/nologin
AWSWeb:~# user
bash: user: command not found
AWSWeb:~# adduser obz
Adding user `obz' ...
Adding new group `obz' (1001) ...
Adding new user `obz' (1001) with group `obz' ...
Creating home directory `/home/obz' ...
Copying files from `/etc/skel' ...
Password: 
Password again: 
Changing the user information for obz
Enter the new value, or press ENTER for the default
        Username []: 
Must enter a value!
        Username []: obz
        Full Name []: ladmin obz
        Room Number []: 1
        Work Phone []: 1234567890
        Home Phone []: 
Must enter a value!
        Home Phone []: 0
        Mobile Phone []: 0
        Country []: cn
        City []: xang
        Language []: mand
        Favorite movie []: 1
        Other []: 1
Is the information correct? [Y/n] y
ERROR: Some of the information you entered is invalid
Deleting user `obz' ...
Deleting group `obz' (1001) ...
Deleting home directory `/home/obz' ...
Try again? [Y/n] y
Changing the user information for obz
Enter the new value, or press ENTER for the default
        Username []: obx
        Full Name []: obx toor
        Room Number []: 1
        Work Phone []: 19089543121
        Home Phone []: 9089342135
        Mobile Phone []: 9089439012
        Country []: cn
        City []: xang
        Language []: manenglish
        Favorite movie []: one
        Other []: twofour
Is the information correct? [Y/n] y
ERROR: Some of the information you entered is invalid
Deleting user `obz' ...
Deleting group `obz' (1001) ...
Deleting home directory `/home/obz' ...
Try again? [Y/n] n
AWSWeb:~# cat adduser obz user cat /etc/passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
lp:x:7:7:lp:/var/spool/lpd:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
uucp:x:10:10:uucp:/var/spool/uucp:/bin/sh
proxy:x:13:13:proxy:/bin:/bin/sh
www-data:x:33:33:www-data:/var/www:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh
list:x:38:38:Mailing List Manager:/var/list:/bin/sh
irc:x:39:39:ircd:/var/run/ircd:/bin/sh
gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/bin/sh
nobody:x:65534:65534:nobody:/nonexistent:/bin/sh
libuuid:x:100:101::/var/lib/libuuid:/bin/sh
richard:x:1000:1000:richard,,,:/home/richard:/bin/bash
sshd:x:101:65534::/var/run/sshd:/usr/sbin/nologin
AWSWeb:~# cat /etc/shadow
cat: /etc/shadow: No such file or directory
AWSWeb:~# /etc/init.d\D/ssh start
bash: /etc/init.D/ssh: command not found
AWSWeb:~# /etc/init.D/ssh startd
bash: /etc/init.d/ssh: command not found
AWSWeb:~# 
AWSWeb:~# 
AWSWeb:~# 
AWSWeb:~# 
AWSWeb:~# 
AWSWeb:~# 
AWSWeb:~# 
AWSWeb:~# 
AWSWeb:~# 
AWSWeb:~# exit
cConnection to server closed.
localhost:~# exit
Connection to server closed.
localhost:~# bye
bash: bye: command not found
localhost:~# exit
Connection to server closed.
localhost:~# admin
bash: admin: command not found
localhost:~# su
localhost:~# ls -l
drwxr-xr-x 1 root root 4096 2013-02-03 17:11 .
drwxr-xr-x 1 root root 4096 2013-02-03 17:11 ..
drwxr-xr-x 1 root root 4096 2009-11-06 11:16 .debtags
-rw------- 1 root root 5515 2009-11-20 09:08 .viminfo
drwx------ 1 root root 4096 2009-11-06 11:13 .aptitude
-rw-r--r-- 1 root root  140 2009-11-06 11:09 .profile
-rw-r--r-- 1 root root  412 2009-11-06 11:09 .bashrc
localhost:~# pwd
/root
localhost:~# cd /
localhost:/# ls -l
drwxr-xr-x 1 root root  4096 2013-02-03 17:11 .
drwxr-xr-x 1 root root  4096 2013-02-03 17:11 ..
drwxr-xr-x 1 root root     0 2009-11-20 08:19 sys
drwxr-xr-x 1 root root  4096 2009-11-08 15:42 bin
drwxr-xr-x 1 root root  4096 2009-11-06 11:08 mnt
drwxr-xr-x 1 root root  4096 2009-11-06 11:08 media
lrwxrwxrwx 1 root root    25 2009-11-06 11:16 vmlinuz -> /boot/vmlinuz-2.6.26-2-686
drwxr-xr-x 1 root root  4096 2009-11-06 11:09 opt
lrwxrwxrwx 1 root root    11 2009-11-06 11:08 cdrom -> /media/cdrom0
drwxr-xr-x 1 root root  4096 2009-11-06 11:08 selinux
drwxrwxrwx 1 root root  4096 2009-11-20 08:19 tmp
dr-xr-xr-x 1 root root     0 2009-11-20 08:19 proc
drwxr-xr-x 1 root root  4096 2009-11-08 15:41 sbin
drwxr-xr-x 1 root root  4096 2009-11-20 08:20 etc
drwxr-xr-x 1 root root  3200 2009-11-20 08:20 dev
drwxr-xr-x 1 root root  4096 2009-11-06 11:09 srv
lrwxrwxrwx 1 root root    28 2009-11-06 11:16 initrd.img -> /boot/initrd.img-2.6.26-2-686
drwxr-xr-x 1 root root  4096 2009-11-08 15:46 lib
drwxr-xr-x 1 root root  4096 2009-11-06 11:22 home
drwxr-xr-x 1 root root  4096 2009-11-06 11:09 var
drwxr-xr-x 1 root root  4096 2009-11-08 15:46 usr
drwxr-xr-x 1 root root  4096 2009-11-08 15:39 boot
drwxr-xr-x 1 root root  4096 2009-11-20 09:08 root
drwx------ 1 root root 16384 2009-11-06 11:08 lost+found
localhost:/# cd /home
localhost:/home# ls -l
ldrwxr-xr-x 1 root root 4096 2013-02-03 17:11 .
drwxr-xr-x 1 root root 4096 2013-02-03 17:11 ..
drwxr-xr-x 1 1000 1000 4096 2009-11-06 11:22 richard
localhost:/home# exit
Connection to server closed.
localhost:~# 
localhost:~# 
localhost:~# 
localhost:~# 
localhost:~# 
localhost:~# 
localhost:~# ssh -D root@http://60.250.65.112/ 1337
The authenticity of host '60.250.65.112 (60.250.65.112)' can't be established.
RSA key fingerprint is 9d:30:97:8a:9e:48:0d:de:04:8d:76:3a:7b:4b:30:f8.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '60.250.65.112' (RSA) to the list of known hosts.
root@60.250.65.112's password: 
Linux localhost 2.6.26-2-686 #1 SMP Wed Nov 4 20:45:37 UTC 2009 i686
Last login: Sat Feb  2 07:07:11 2013 from 192.168.9.4
localhost:~# uname -a
Linux localhost 2.6.24-2-generic #1 SMP Thu Dec 20 17:36:12 GMT 2007 i686 GNU/Linux
localhost:~# pwd
/root
localhost:~# cd /
localhost:/# ls -l
drwxr-xr-x 1 root root  4096 2013-02-03 17:19 .
drwxr-xr-x 1 root root  4096 2013-02-03 17:19 ..
drwxr-xr-x 1 root root     0 2009-11-20 08:19 sys
drwxr-xr-x 1 root root  4096 2009-11-08 15:42 bin
drwxr-xr-x 1 root root  4096 2009-11-06 11:08 mnt
drwxr-xr-x 1 root root  4096 2009-11-06 11:08 media
lrwxrwxrwx 1 root root    25 2009-11-06 11:16 vmlinuz -> /boot/vmlinuz-2.6.26-2-686
drwxr-xr-x 1 root root  4096 2009-11-06 11:09 opt
lrwxrwxrwx 1 root root    11 2009-11-06 11:08 cdrom -> /media/cdrom0
drwxr-xr-x 1 root root  4096 2009-11-06 11:08 selinux
drwxrwxrwx 1 root root  4096 2009-11-20 08:19 tmp
dr-xr-xr-x 1 root root     0 2009-11-20 08:19 proc
drwxr-xr-x 1 root root  4096 2009-11-08 15:41 sbin
drwxr-xr-x 1 root root  4096 2009-11-20 08:20 etc
drwxr-xr-x 1 root root  3200 2009-11-20 08:20 dev
drwxr-xr-x 1 root root  4096 2009-11-06 11:09 srv
lrwxrwxrwx 1 root root    28 2009-11-06 11:16 initrd.img -> /boot/initrd.img-2.6.26-2-686
drwxr-xr-x 1 root root  4096 2009-11-08 15:46 lib
drwxr-xr-x 1 root root  4096 2009-11-06 11:22 home
drwxr-xr-x 1 root root  4096 2009-11-06 11:09 var
drwxr-xr-x 1 root root  4096 2009-11-08 15:46 usr
drwxr-xr-x 1 root root  4096 2009-11-08 15:39 boot
drwxr-xr-x 1 root root  4096 2009-11-20 09:08 root
drwx------ 1 root root 16384 2009-11-06 11:08 lost+found
localhost:/# cd /root
localhost:~# ls -l
ldrwxr-xr-x 1 root root 4096 2013-02-03 17:19 .
drwxr-xr-x 1 root root 4096 2013-02-03 17:19 ..
drwxr-xr-x 1 root root 4096 2009-11-06 11:16 .debtags
-rw------- 1 root root 5515 2009-11-20 09:08 .viminfo
drwx------ 1 root root 4096 2009-11-06 11:13 .aptitude
-rw-r--r-- 1 root root  140 2009-11-06 11:09 .profile
-rw-r--r-- 1 root root  412 2009-11-06 11:09 .bashrc
localhost:~# cd /hocd /home/
localhost:/home# ls -l
drwxr-xr-x 1 root root 4096 2013-02-03 17:20 .
drwxr-xr-x 1 root root 4096 2013-02-03 17:20 ..
drwxr-xr-x 1 1000 1000 4096 2009-11-06 11:22 richard
localhost:/home# exit
Connection to server closed.
localhost:~# exit
Connection to server closed.
localhost:~# 

 Conclusion:

After a year with Kippo, I have learned a lot about what these basic attackers do when connecting to seemingly open ssh hosts. There is plenty more to learn though. I have some plans on building out a larger honeypot infrastructure, and automating some of the data collection and parsing. Additionally I would like to spend more time analyzing the sessions and malware for further trends. I'll keep you all posted!

*Big thanks to Bruteforce Labs for their tools and expertise in honeypots.

Wednesday
Jun182014

Automater version 2.1 released - Proxy capabilities and a little user-agent modification

It has been a little while since some of our posts on Automater and its capabilities. However, we haven't stopped moving forward on the concept and are proud to announce that Automater has been included in the latest release of REMnux  and also made the cut for ToolsWatch. Of course, you should get your copy from our GitHub repo since we'll be updating GitHub just prior to getting the updates to other repositories. Okay, enough back-patting and proverbial "glad handing", we are excited to let everyone know that Automater has a new user-agent output that is configurable by the user and now fully supports proxy-based requests and submissions! Thanks go out to nullprobe for taking interest in the code and pushing us forward on getting the proxy capability completed. Although we didn't use the exact submission he provided, we definitely used some code and ideas he provided. Thanks again nullprobe!

The New Stuff

Okay, for a quick review of some of the old posts if you're new to Automater, or need to refresh yourself with the product, please go here, here, and here to read about Automater its capabilities and extensibility as well as output format etc... As you probably know, Automater is an extensible OSINT tool that has quite a few capabilities. To get straight to the point, Automater can now be run with new command-line tags to enable proxy functionality and to change the user-agent submitted in the header of the web requests made from the tool.

User-Agent Changes

Prior to this upgrade, the Automater sent a default user-agent string based on the browser settings on the device hosting the application. While this is probably fine, it just......well.....wasn't good enough for us. By default, the Automater now sends the user-agent string of 'Automater/2.1' with requests and posts (if post submissions are required). However, you now have the ability to change that user-agent string to one of your liking by using the command-line parameter or -a or --agent followed by the string you'd like to use. A new Automater execution line using this new option would look something like:

python Automater.py 1.1.1.1 -a MyUserAgent/1.0

or some such thing that you'd like to send as a user-agent string in the header.

Proxy Capabilities

A significant modification in this version was the inclusion of a capability to utilize a network proxy system. To enable this functionality, all that is needed is the command line argument --proxy followed by the address and the port the proxy device is listening on during Automater execution. For instance, if my network proxy is at IP address 10.1.1.1 and is listening on port 8080 I would execute the Automater by typing:

python Automater.py 1.1.1.1 --proxy 10.1.1.1:8080

of course, your system will utilize standard DNS resolution practices if you only know the name of your network proxy and resolve the IP address automatically. So, if the proxy is known as proxy.company.com listening on port 8080, you would type:

python Automater.py 1.1.1.1 --proxy proxy.company.com:8080

it's as simple as that!

Further Movement

We are still working on other submissions and requests, so please keep them coming as we will continue to upgrade as we get requests as well as when we find more efficient ways to do things. We appreciate the support and would love to answer any questions you may have, so give us a yell if you need anything.

p4r4n0y1ng and 1aN0rmus.....OUT!

Thursday
May292014

Memory Forensics presentation from BSidesNola

As some of you may already know, a couple weeks back @HiddenIllusion and I gave a talk on Memory Forenics titled "Mo' Memory No' Problems" at BSidesNola. While the talk wasn't recorded we did want to put the slides out for the folks who were not able to attend. I hope you all enjoy.

*Be sure to visit HiddenIllusion's blog. Also for the analysis walk through at the end of the deck, we used a memory dump of 1337 hacker activity generated by Tony Lee of SecuritySynapse.

Wednesday
Jan292014

Categorizing Maltrieve Output

UPDATE: @kylemaxwell has accepted the pull of this script into the main maltrieve repo!

*Note: For starters, we need to say thanks as usual to technoskald and point you in the right direction to the Maltrieve Code on GitHub.

Overview

We have posted Maltrieve articles a couple times in the past, but the capabilities of this application continue to amaze us so we thought we'd add to our past contributions. During our initial build of a malware collection box (malware zoo creation) we utilized a standard concept of running Maltrieve throughout the day using a cron job. As most simple things do, this became rather complex based on the fact that the Maltrieve delivery is not categorized in any method, so finding what you're looking for is.....shall we say.....difficult at best. This article discusses a categorization method to help you organize your malware zoo so that it is manageable.

If you would prefer this article in video format, it is provided as well:

Getting started

The box containing the malware repository is a standard Precise Pangolin Ubuntu Distro (12.04 LTS), so no big tricks or hooks here. Maltrieve is installed in a standard format, but a 1TB drive is being utilized to store the malware retrieved. The box has 3TB worth of space for later use, but for now we'll deal with just the 1TB drive. The malware repository is mounted at /media/malware/maltrievepulls. All scripts utilized (to include the Maltrieve python scripts) are located at /opt/maltrieve. Again, nothing flashy in any of this, so it should be easy for you to get your box setup quick if you'd like.

Running Maltrieve Consistently

To begin the build of the malware repository, we wanted to run the maltrieve scripts hourly so that the directory would fill with new and interesting malware consistently and quickly. This screamed “crontab”, so we fired up a terminal and ran sudo crontab -l and then sudo crontab -e so that we could edit the crontab. Our initial entry was as follows:

hourly python /opt/maltrieve/maltrieve.py -d /media/malware/maltrievepulls

@hourly echo "maltrieve run at: $(date) $(time)" >> /home/username/Documents/maltrievelog.log

This simply tells the system to run the maltrieve.py python script on an hourly basis and send the results to the /media/malware/maltrievepulls directory for safe storage. The second entry basically adds a little stamp in a file in my home directory so I can ensure the cron job is running every hour – you can obviously NOT include this statement if you don't see fit. In any case, we quickly noticed that the Maltrieve app was doing its job and we went about our business allowing the box to do what we asked. We quickly were swimming in malware and were ready to start analyzing to our hearts delight when we ran into the problem!

The Problem

Maltrieve does exactly what it's told and it does it well – find malware from specific sites and put it in a directory of your liking. And it finds LOTS OF MALWARE if you keep running it as we did in hopes of having a massive store. However, the files are given a hashed name that has very little use to the human eye, and they are just plopped merrily into the directory you choose when you run the malware.py python script. It became quite tedious to run the file command on files that just “looked” interesting based on a hashed filename that gave little meaning to what it might be in terms of formatting, or even payload. A quick look could allow you to do some judging by filesize, but basic command line sorting, grepping, awking, and loads of other tools were needed to try and fix the problem. These methods were simply tedious and after we began to have hundreds of GBs of malware, it became downright no fun any more. The picture below will show you a glimpse of the problem.

Hardly the beacon of light for finding what you're looking for from your malware repository.

Running the file command on a few of these things starts showing some potential though because what you get from doing this looks like:

file 818fc882dab3e682d83aabf3cb8b453b

818fc882dab3e682d83aabf3cb8b453b: PE32 executable (GUI) Intel 80386, for MS Windows

 

file fd8fd6d345cb630d7f1b6926ce7d28b3

fd8fd6d345cb630d7f1b6926ce7d28b3: Zip archive data, at least v1.0 to extract

So here we find that we have 2 pieces of malware, one is a Portable Executable for a Windows box and the other is a Zip archive. This is a very nice start, but was just 2 needles in a large and growing haystack, and the manual effort was laborious and downright daunting.

Bash to the Rescue

As coders love to do, our answer was to take the awesome product Maltrieve and throw some more code at it. My initial thought was to extend the python script, but since I pulled this from a GitHub repository I didn't want to modify the code and then have to “re-modify” it later if things were ever changed or upgraded. My answer was to create a small Bash Shell script and run it to help categorize our malware repository. The requirements we set upon ourselves were to categorize the code into multiple directories based on the first word output from the file command and then further categorize that by separating the code by size. We decided that 0-50KB files would be considered “small”, 51KB-1MB would be considered “medium”, 1.xMB-6MB would be considered “large”, and anything larger would be considered “xlarge”. It's a rather brutish method but it's something and it seems to work nicely. So in the end, we would want to see a directory tree that looked something like this:

--PE32

----small

----medium

----large

----xlarge

--Zip

----small

----medium

----large

----xlarge

and so on and so on.

Since we set up our maltrieve pulls to run hourly we decided to run the bash script - which we so obviously named maltrievecategorizer.sh – to run on every half hour, which allows maltrieve to finish and then categorizes the latest findings. To make this happen, we cracked open crontab again with sudo crontab -e and added the following to the end of the file:

30 * * * * bash /opt/maltrieve/maltrievecategorizer.sh

which just says to run our bash script on the half hour of every day of the year, plain and simple.

The Bash Script

The maltrievecategorizer.sh bash script can be seen below. An explanation follows the script.

#!/bin/sh

 

smallstr="/small"

mediumstr="/medium"

largestr="/large"

xlargestr="/xlarge"

smallfile=50001

mediumfile=1000001

largefile=6000001

root_dir="/media/malware/maltrievepulls/"

all_files="$root_dir*"

for file in $all_files

do

  if [ -f $file ]; then

    outstring=($(eval file $file))

    stringsubone="${outstring[1]}"

    case $stringsubone in

      "a") stringsubone="PerlScript";;

      "very") stringsubone="VeryShortFile";;

      "empty") rm $file

        continue;;

      *);;

    esac

    if [ ! -d $root_dir$stringsubone ]; then

      mkdir -p "$root_dir$stringsubone"

      mkdir -p "$root_dir$stringsubone$smallstr"

      mkdir -p "$root_dir$stringsubone$mediumstr"

      mkdir -p "$root_dir$stringsubone$largestr"

      mkdir -p "$root_dir$stringsubone$xlargestr"

    fi

    filesize=$(stat -c %s $file)

    if [[ "$filesize" -le "$smallfile" ]]; then

      mv $file "$root_dir$stringsubone$smallstr/"

    elif [[ "$filesize" -le "$mediumfile" ]]; then

      mv $file "$root_dir$stringsubone$mediumstr/"

    elif [[ "$filesize" -le "$largefile" ]]; then

      mv $file "$root_dir$stringsubone$largestr/"

    else

      mv $file "$root_dir$stringsubone$xlargestr/"

    fi

  fi

done

The first several lines simply create string literals for “small”, “medium”, “large”, and “xlarge” so we can use them later in the script, and then we create three variables “smallfile”, ”mediumfile”, and ”largefile” so we can compare file sizes later in the script. So far so good! The lines containing:

root_dir="/media/malware/maltrievepulls/"

all_files="$root_dir*"

for file in $all_files

do

if [ -f $file ]; then

do nothing more than set our root directory where our maltrieve root is and then run a loop against every file in that directory.

outstring=($(eval file $file))

Creates a variable called outstring that is an array of words representing the output of the file command. So using the file command output from above, the outstring array would have 818fc882dab3e682d83aabf3cb8b453b: PE32 executable (GUI) Intel 80386, for MS Windows in it. Each array element would be separated by the space in the statement, so outstring[0] would store: 818fc882dab3e682d83aabf3cb8b453b: and outstring[1] would store: PE32 and outstring[2] would store: executable and so on and so on. We are only interested in outstring[1] to make our categorization a possibility.

 

Our next line in the script

stringsubone="${outstring[1]}"

 

creates a variable named stringsubone that contains just the string held in outstring[1] so using the example above, stringsubone would now hold PE32.

The case statement you see next

case $stringsubone in

"a") stringsubone="PerlScript";;

"very") stringsubone="VeryShortFile";;

"empty") rm $file

continue;;

*);;

esac

fixes a couple problems with the file command's output. In the case of a piece of malware that is a Perl Script, the output that the file command provides is: a /usr/bin/perl\015 script. This may be helpful for a human, but it makes our stringsubone variable hold the letter “a” in it, which means we would be creating a directory later for categorization called “a” which is LESS THAN USEFUL. The same problem happens with something called Short Files where the output from the file command is: very short file (no magic) which means our stringsubone variable would hold the word “very” which isn't a great name for a directory either. The case statement takes care of these 2 and allows for a better naming method for these directories. It also allows for the removal of empty files which are found as well.

The next lines

if [ ! -d $root_dir$stringsubone ]; then

mkdir -p "$root_dir$stringsubone"

mkdir -p "$root_dir$stringsubone$smallstr"

mkdir -p "$root_dir$stringsubone$mediumstr"

mkdir -p "$root_dir$stringsubone$largestr"

mkdir -p "$root_dir$stringsubone$xlargestr"

fi

simply tell the script to look in the directory and if a directory that has the same name as stringsubone does not exist then create it. Then create the directory small, medium, large, and xlarge within that directory for further categorization. Using the PE32 example from above, basically this says “if there's no PE32 directory in this root directory, create one and create the sub-directories small, medium, large, and xlarge within that directory. If the PE32 directory already exists then do nothing”.

The remaining lines look difficult but are simple:

filesize=$(stat -c %s $file)

if [[ "$filesize" -le "$smallfile" ]]; then

mv $file "$root_dir$stringsubone$smallstr/"

elif [[ "$filesize" -le "$mediumfile" ]]; then

mv $file "$root_dir$stringsubone$mediumstr/"

elif [[ "$filesize" -le "$largefile" ]]; then

mv $file "$root_dir$stringsubone$largestr/"

else

mv $file "$root_dir$stringsubone$xlargestr/"

fi

fi

first we create a variable called filesize and then using the stat command, we store the file size in that variable. Then we find out if the file fits in our category of small, medium, large, or xlarge using if and elif comparison statements. Whichever comparison statement turns out to be correct is where the file is then successfully moved.

 

The results of this solution are in the picture below.

 

Conclusion

As you can plainly see, we now have the ability to quickly look for specific files in an easier fashion. If I am looking for a piece of malware that I know to be in HTML format that was over 50KB, but less than 1MB, I can easily roam to HTML->medium and a one-liner file command with some grepping and find what I am looking for. I'm certain there are other methods to go about this process and probably WAY better methods of categorizing this directory, so if you have some ideas please shoot them our way and we'll give them a try and see if we can help the community.