Sponsor

Use code SecuraBit_Tek05 for 5% off any SANS course in any format.

SANS is the most trusted and by far the largest source for information security training in the world

Security Videos
Wednesday
Jan292014

Categorizing Maltrieve Output

UPDATE: @kylemaxwell has accepted the pull of this script into the main maltrieve repo!

*Note: For starters, we need to say thanks as usual to technoskald and point you in the right direction to the Maltrieve Code on GitHub.

Overview

We have posted Maltrieve articles a couple times in the past, but the capabilities of this application continue to amaze us so we thought we'd add to our past contributions. During our initial build of a malware collection box (malware zoo creation) we utilized a standard concept of running Maltrieve throughout the day using a cron job. As most simple things do, this became rather complex based on the fact that the Maltrieve delivery is not categorized in any method, so finding what you're looking for is.....shall we say.....difficult at best. This article discusses a categorization method to help you organize your malware zoo so that it is manageable.

If you would prefer this article in video format, it is provided as well:

Getting started

The box containing the malware repository is a standard Precise Pangolin Ubuntu Distro (12.04 LTS), so no big tricks or hooks here. Maltrieve is installed in a standard format, but a 1TB drive is being utilized to store the malware retrieved. The box has 3TB worth of space for later use, but for now we'll deal with just the 1TB drive. The malware repository is mounted at /media/malware/maltrievepulls. All scripts utilized (to include the Maltrieve python scripts) are located at /opt/maltrieve. Again, nothing flashy in any of this, so it should be easy for you to get your box setup quick if you'd like.

Running Maltrieve Consistently

To begin the build of the malware repository, we wanted to run the maltrieve scripts hourly so that the directory would fill with new and interesting malware consistently and quickly. This screamed “crontab”, so we fired up a terminal and ran sudo crontab -l and then sudo crontab -e so that we could edit the crontab. Our initial entry was as follows:

hourly python /opt/maltrieve/maltrieve.py -d /media/malware/maltrievepulls

@hourly echo "maltrieve run at: $(date) $(time)" >> /home/username/Documents/maltrievelog.log

This simply tells the system to run the maltrieve.py python script on an hourly basis and send the results to the /media/malware/maltrievepulls directory for safe storage. The second entry basically adds a little stamp in a file in my home directory so I can ensure the cron job is running every hour – you can obviously NOT include this statement if you don't see fit. In any case, we quickly noticed that the Maltrieve app was doing its job and we went about our business allowing the box to do what we asked. We quickly were swimming in malware and were ready to start analyzing to our hearts delight when we ran into the problem!

The Problem

Maltrieve does exactly what it's told and it does it well – find malware from specific sites and put it in a directory of your liking. And it finds LOTS OF MALWARE if you keep running it as we did in hopes of having a massive store. However, the files are given a hashed name that has very little use to the human eye, and they are just plopped merrily into the directory you choose when you run the malware.py python script. It became quite tedious to run the file command on files that just “looked” interesting based on a hashed filename that gave little meaning to what it might be in terms of formatting, or even payload. A quick look could allow you to do some judging by filesize, but basic command line sorting, grepping, awking, and loads of other tools were needed to try and fix the problem. These methods were simply tedious and after we began to have hundreds of GBs of malware, it became downright no fun any more. The picture below will show you a glimpse of the problem.

Hardly the beacon of light for finding what you're looking for from your malware repository.

Running the file command on a few of these things starts showing some potential though because what you get from doing this looks like:

file 818fc882dab3e682d83aabf3cb8b453b

818fc882dab3e682d83aabf3cb8b453b: PE32 executable (GUI) Intel 80386, for MS Windows

 

file fd8fd6d345cb630d7f1b6926ce7d28b3

fd8fd6d345cb630d7f1b6926ce7d28b3: Zip archive data, at least v1.0 to extract

So here we find that we have 2 pieces of malware, one is a Portable Executable for a Windows box and the other is a Zip archive. This is a very nice start, but was just 2 needles in a large and growing haystack, and the manual effort was laborious and downright daunting.

Bash to the Rescue

As coders love to do, our answer was to take the awesome product Maltrieve and throw some more code at it. My initial thought was to extend the python script, but since I pulled this from a GitHub repository I didn't want to modify the code and then have to “re-modify” it later if things were ever changed or upgraded. My answer was to create a small Bash Shell script and run it to help categorize our malware repository. The requirements we set upon ourselves were to categorize the code into multiple directories based on the first word output from the file command and then further categorize that by separating the code by size. We decided that 0-50KB files would be considered “small”, 51KB-1MB would be considered “medium”, 1.xMB-6MB would be considered “large”, and anything larger would be considered “xlarge”. It's a rather brutish method but it's something and it seems to work nicely. So in the end, we would want to see a directory tree that looked something like this:

--PE32

----small

----medium

----large

----xlarge

--Zip

----small

----medium

----large

----xlarge

and so on and so on.

Since we set up our maltrieve pulls to run hourly we decided to run the bash script - which we so obviously named maltrievecategorizer.sh – to run on every half hour, which allows maltrieve to finish and then categorizes the latest findings. To make this happen, we cracked open crontab again with sudo crontab -e and added the following to the end of the file:

30 * * * * bash /opt/maltrieve/maltrievecategorizer.sh

which just says to run our bash script on the half hour of every day of the year, plain and simple.

The Bash Script

The maltrievecategorizer.sh bash script can be seen below. An explanation follows the script.

#!/bin/sh

 

smallstr="/small"

mediumstr="/medium"

largestr="/large"

xlargestr="/xlarge"

smallfile=50001

mediumfile=1000001

largefile=6000001

root_dir="/media/malware/maltrievepulls/"

all_files="$root_dir*"

for file in $all_files

do

  if [ -f $file ]; then

    outstring=($(eval file $file))

    stringsubone="${outstring[1]}"

    case $stringsubone in

      "a") stringsubone="PerlScript";;

      "very") stringsubone="VeryShortFile";;

      "empty") rm $file

        continue;;

      *);;

    esac

    if [ ! -d $root_dir$stringsubone ]; then

      mkdir -p "$root_dir$stringsubone"

      mkdir -p "$root_dir$stringsubone$smallstr"

      mkdir -p "$root_dir$stringsubone$mediumstr"

      mkdir -p "$root_dir$stringsubone$largestr"

      mkdir -p "$root_dir$stringsubone$xlargestr"

    fi

    filesize=$(stat -c %s $file)

    if [[ "$filesize" -le "$smallfile" ]]; then

      mv $file "$root_dir$stringsubone$smallstr/"

    elif [[ "$filesize" -le "$mediumfile" ]]; then

      mv $file "$root_dir$stringsubone$mediumstr/"

    elif [[ "$filesize" -le "$largefile" ]]; then

      mv $file "$root_dir$stringsubone$largestr/"

    else

      mv $file "$root_dir$stringsubone$xlargestr/"

    fi

  fi

done

The first several lines simply create string literals for “small”, “medium”, “large”, and “xlarge” so we can use them later in the script, and then we create three variables “smallfile”, ”mediumfile”, and ”largefile” so we can compare file sizes later in the script. So far so good! The lines containing:

root_dir="/media/malware/maltrievepulls/"

all_files="$root_dir*"

for file in $all_files

do

if [ -f $file ]; then

do nothing more than set our root directory where our maltrieve root is and then run a loop against every file in that directory.

outstring=($(eval file $file))

Creates a variable called outstring that is an array of words representing the output of the file command. So using the file command output from above, the outstring array would have 818fc882dab3e682d83aabf3cb8b453b: PE32 executable (GUI) Intel 80386, for MS Windows in it. Each array element would be separated by the space in the statement, so outstring[0] would store: 818fc882dab3e682d83aabf3cb8b453b: and outstring[1] would store: PE32 and outstring[2] would store: executable and so on and so on. We are only interested in outstring[1] to make our categorization a possibility.

 

Our next line in the script

stringsubone="${outstring[1]}"

 

creates a variable named stringsubone that contains just the string held in outstring[1] so using the example above, stringsubone would now hold PE32.

The case statement you see next

case $stringsubone in

"a") stringsubone="PerlScript";;

"very") stringsubone="VeryShortFile";;

"empty") rm $file

continue;;

*);;

esac

fixes a couple problems with the file command's output. In the case of a piece of malware that is a Perl Script, the output that the file command provides is: a /usr/bin/perl\015 script. This may be helpful for a human, but it makes our stringsubone variable hold the letter “a” in it, which means we would be creating a directory later for categorization called “a” which is LESS THAN USEFUL. The same problem happens with something called Short Files where the output from the file command is: very short file (no magic) which means our stringsubone variable would hold the word “very” which isn't a great name for a directory either. The case statement takes care of these 2 and allows for a better naming method for these directories. It also allows for the removal of empty files which are found as well.

The next lines

if [ ! -d $root_dir$stringsubone ]; then

mkdir -p "$root_dir$stringsubone"

mkdir -p "$root_dir$stringsubone$smallstr"

mkdir -p "$root_dir$stringsubone$mediumstr"

mkdir -p "$root_dir$stringsubone$largestr"

mkdir -p "$root_dir$stringsubone$xlargestr"

fi

simply tell the script to look in the directory and if a directory that has the same name as stringsubone does not exist then create it. Then create the directory small, medium, large, and xlarge within that directory for further categorization. Using the PE32 example from above, basically this says “if there's no PE32 directory in this root directory, create one and create the sub-directories small, medium, large, and xlarge within that directory. If the PE32 directory already exists then do nothing”.

The remaining lines look difficult but are simple:

filesize=$(stat -c %s $file)

if [[ "$filesize" -le "$smallfile" ]]; then

mv $file "$root_dir$stringsubone$smallstr/"

elif [[ "$filesize" -le "$mediumfile" ]]; then

mv $file "$root_dir$stringsubone$mediumstr/"

elif [[ "$filesize" -le "$largefile" ]]; then

mv $file "$root_dir$stringsubone$largestr/"

else

mv $file "$root_dir$stringsubone$xlargestr/"

fi

fi

first we create a variable called filesize and then using the stat command, we store the file size in that variable. Then we find out if the file fits in our category of small, medium, large, or xlarge using if and elif comparison statements. Whichever comparison statement turns out to be correct is where the file is then successfully moved.

 

The results of this solution are in the picture below.

 

Conclusion

As you can plainly see, we now have the ability to quickly look for specific files in an easier fashion. If I am looking for a piece of malware that I know to be in HTML format that was over 50KB, but less than 1MB, I can easily roam to HTML->medium and a one-liner file command with some grepping and find what I am looking for. I'm certain there are other methods to go about this process and probably WAY better methods of categorizing this directory, so if you have some ideas please shoot them our way and we'll give them a try and see if we can help the community.

 

Monday
Dec232013

Analyzing DarkComet in Memory

*Note: This article turned out much longer than I originally anticipated. For those who are looking actionable data from this report but don’t want to suffer through the entire article, there are Yara rules at the end!

Overview

In a recent case I came across DarkComet and had the opportunity to test out my new Volatility skills. Over the course of this article I will be using a memory dump from a Windows7 VM that I installed the following sample on:

f6351da84168d40fae8da0c156fbab0f – Downloaded from VirusTotal

If you would like to follow along feel free to Download a practice memdump. Keep in mind that the memdump available for download is from the same piece of malware but from a different machine then I used in the rest of the article, so PIDs won't match up. That should make the memdump a little more fun for you. In the case I was working, all I had was a memory sample and an alert from a network appliance stating that DarkComet communications came from the suspected host. My goal in the investigation was to determine if the host was actually infected, is the infection DarkComet, what was the malware doing, was there any exfiltration, and was this infection used to pivot elsewhere in the network.

Getting started

There are a few different approaches an analyst can take. Some will go research heavy and try to learn what they can about DarkComet before looking at the dump, while others like to dive right in. Me, I live dangerously sometimes, so I dove in without doing much research. I actually like a hybrid approach though. Just like when I get a PCAP I like to get a feel for a memory dump before doing too much research, mainly because I don’t want to subject myself to confirmation bias.

Process analysis

Like always I start off with an imageinfo to get an idea of what profile I should use, but also to understand the timezone of the image. Then I move onto psxview. Running psxview, Volatility will check for processes within the memory dump in various ways. This helps us find suspicious processes even if they try to circumvent analysis via one or multiple standard methods. Using the –A flag with psxview applies rules to help us understand what legitimate processes should show as “False” by replacing “False” with “Okay”.

 

In this case, we didn’t really have to do much analysis to figure out what our bad process probably is. The attackers made it somewhat easy on us by using a common misspelling runddl32.exe. In scenarios where the malware isn’t so obvious we may be looking at loaded dlls, launch times, parents, occurrences, hooks, and paths to find bad processes. So let’s do a dlllist on this guy to find out where it resides.

Ahh, no surprise here, as we typically see %APPDATA% paths leveraged by attackers. That gives us something to work from.

*MSDCSC is a common path utilized by DarkComet. Most likely a default path in the builder.

File extraction

Knowing the path we can check if the file is potentially still resident in memory with filescan. In this case it was, so I used dumpfiles to extract it out. In cases where that doesn’t work procexedump may be better suited.

With it extracted we can then do general analysis on it like one of my favorite commands ever: “strings”. I will skip that for this article though as I want to focus more on what is in memory rather than in the file extracted from memory.

Network communications

Now around this time in the actual case I began to take a closer look at the network connections. Unfortunately though, I did not simulate those connections in this memory dump to be able to show you, so we will skip that as well. Keep in mind you would be looking for what external addresses are involved, what ports, and of course when the network connections occurred. I usually feed the network indicators to Automater for OSINT analysis. Additionally, the connections may be a good place to start getting an idea if lateral movement may be occurring. Looking for connections over 445 or 3389 may indicate pivoting, especially when it is two workstations that are involved.

More process analysis

Getting back to the processes, I thought it would be good idea to do a pslist so I could understand what the parent pid was. The parent was no longer around, so I don’t know what did the initial launch, but I do see other processes launched by that same parent. Also, drawing more attention to the process we see notepad.exe launching under runddl32.exe. Usually when I see notepad.exe I will run the notepad plugin in volatility which will show the text of a notepad session. In this case that did not return any results. Using malfind on the notepad process we see that it is probably not doing any notepad like activity anyways.

Find the Mutants!

At this point there is no question that runddl32.exe is not a normal process. So let’s try to identify other indicators. A great place to start if you know the bad process is to look at handles to see what files, mutants, and registry keys may be of interest. To start off with the Mutants aka Mutex objects, there are some pretty apparent indicators.

DarkComet has a default mutex of “DC_MUTEX-<7 alphanumeric characters>”. For those who don’t understand what a mutex is, there are plenty of good articles you can read up on, but for the purpose of this discussion think of it as a way a program can let the OS know it is there so it doesn’t get launched again while it is already running.

I looked at a lot of DarkComet samples while trying to test the Yara rules that you will see at the bottom. In that testing here are the unique Mutex objects I saw:

DC_MUTEX-8H6JNU1, DC_MUTEX-HCLS4W4, DCPERSFWBP, DC_MUTEX-6YKRNWA, DCPERSFWBP, DC_MUTEX-9CB5GV6, DC_MUTEX-PT4LZLZ, DC_MUTEX-KHNEW06, DC_MUTEX-9CB5GV6, DC_MUTEX-T27B7E9, DC_MUTEX-T4AJFQ9, DC_MUTEX-GQ3M3G4, DC_MUTEX-FLJQNAW, DC_MUTEX-2QUGF5V, BZIRD0K04Q, DC_MUTEX-8Q459BS, DC_MUTEX-WRG2B6H, DC_MUTEX-TMJMXQD, DC_MUTEX-90Q9J91, DC_MUTEX-8H6JNU1, DC_MUTEX-ZEG6XKR, _x_X_UPDATE_X_x_, _x_X_PASSWORDLIST_X_x_, _x_X_BLOCKMOUSE_X_x_, ***MUTEX***, ***MUTEX***_PERSIST, MUTEX***_SAIR, DC_MUTEX-U9WXEAQ, DC_MUTEX-E44KJ8W, DC_MUTEX-RT7ED81

 

*There are a couple in here that I am not positive were actually DarkComet as I used AV signatures to grab the sample set. As we all know AV can sometimes be misleading.

DarkComet config

We have already learned a lot about this malware, but there are still plenty of other things to know. For instance, did it implement a persistence mechanism, what capabilities does it have, how did it get on the system, and so on. To begin to answer those questions I like to dump out the memory of a process and then run strings against it to start to paint a picture. I ran the following command to generate a memdump of the process (runddl32.exe) itself.

python ~/Desktop/volatility/volatility_train/vol.py -f ~/interview/WIN-MKFGQA8PLLR-20131219-151611.raw --profile=Win7SP1x86 memdump -p 1972 -D.

Now with that I ran strings. Keep in mind that when running strings in Linux you need to use the –a options and you have to run separately for ASCII and UNICODE, which will look something like this:

strings -a 1972.dmp #ASCII

strings -a –e l 1972.dmp #UNICODE

*There are of course other methods that can be leveraged here to combine these commands

After spending a ton of time looking through these strings, I began to pick out some very obvious data. My favorite of which is the DarkComet configuration:

As you can imagine, once I found the DarkComet configuration in memory the case changed a lot for me. To really understand it though I had to do a bit of research to understand what each of these options meant.

Most of the data on these commands came from two places, searching through the posts on hackforums[.]net and and article from Context Information Security (http://contextis.com/research/blog/malware-analysis-dark-comet-rat/)

Some of them are obvious like NETDATA, PERSINT, KEYNAME, etc. Others are not so obvious though, like OFFLINEK which became a very important part of my case. So let’s explain some of these here:

MUTEX={DC_MUTEX-KHNEW06} # This is the Mutant/mutex value that is used

SID={Guest16} # Campaign name

FWB={0} # Firewall bypass (Windows Firewall)

NETDATA={test213.no-ip.info:1604} # C2 *Most seem to be 1604 so that is probably the default

GENCODE={F6FE8i2BxCpu} # Not quite sure on this one, perhaps part of building the encryption?

KEYNAME={MicroUpdate} # Registry key name

EDTDATE={16/04/2007} # Used for time stamp manipulation

PERSINST={1} # Persistence

MELT={0} # Delete the original executable or not

CHANGEDATE={1} # Use the EDTDATE to modify the $SI timestamps

DIRATTRIB={6} # Modify the attributes of a directory, such as make it hidden

FILEATTRIB={6} # Modify the attributes of a file, such as make it hidden

OFFLINEK={1} # Offline keylogging

So as you can tell, I didn’t find out what each option does, but enough to get by for now. If I was really interested in knowing each possible option and what it means, I would take the time to get the latest version of the builder and try out each option to determine what the config changed too. Now this sample's configuration differs slightly from the sample I had for the case, but the general strokes are the same.

Keylogger

The OFFLINEK option had me confused for a bit. So to explain it a bit better, when OFFLINEK is enabled “{1}” the malware will continue to log keystroke to a local file that can then be picked up by the attacker as they want. When disabled, the attacker only has access to keystrokes when the attacker has a live session open with the victim. Looking through the strings memdump in my case quickly showed artifacts that were indicative of a keylogger such as “[<-]” and the titles of open windows like outlook emails. Strings also showed a path that seemed somewhat suspicious as well.

In my actual case I did a filescan to see if there was a file object open for the file in the dclogs directory. There was, so I used dumpfiles to extract it. With the file in hand it was easy to see all the keystrokes that were logged in that file.

DarkComet logs keystrokes in a different file for each day. In all the testing and client work I have done, it seems that only the key log file for the day of the acquisition can be extracted as a full file from memory. The log for keystrokes by default are stored in a file named “dclogs\<Date>.dc”. This can be useful in finding the initial infection date, as the log files will have entries in the MFT (We’ll talk more on the timeline later). Within the log, the keystrokes and open windows are logged as seen below.

DarkComet commands

While looking through the strings in the memdump of the runddl32.exe process I also came across some commands that appear to be functions for DarkComet. This hints at some of the functionality. None of this is surprising though, as we have seen plenty or RATs that all have similar functionality.

Persistence

At this point in the investigation I had about answered everything I had wanted to in my actual investigation. I did not find any evidence that would suggest lateral movement occurred, but I did see plenty of evidence that suggest exfiltration did. My guess is that the exfiltration data was the keylogger logs, but I was not able to prove with the memory image alone. I did not run the ethscan plugin on this occasion, but that may have been able to pull a pcap of suspect traffic. The only remaining items I really wanted to answer were where the persistence key is stored and when did the infection take place.

Let’s check out the persistence first. We know what the key name is based on the configuration artifacts we pulled (MicroUpdate). We also know via research that there are only a few methods available to DarkComet via the builder for persistence. The standard Run key is used most commonly so it is probably the default. Thanks to the printkey plugin this should be a breeze.

There we have it, a standard RUN key in the HKCU for the user that was logged on at the time of infection. With persistence understood, time to check out timeline related data.

Timeline

In the Volatility Class @gleeda goes over making a “Super Timeline” using time data in the memory. This is done not just with the timeliner plugin, but also by extracting out the registry and a few other techniques. In my actual case that was what I did, but for this demonstration the MFT plugin alone will suffice. Keep in mind that running timeline data can take a while, so what I like to do is run your general plugins like psxview, pstree, pslist, dlllist, netscan, handles, etc and output them to separate files, so you can cat and grep your way through them for analysis while the timeline is building.

python ~/Desktop/volatility/volatility_train/vol.py -f ~/interview/WIN-MKFGQA8PLLR-20131219-151611.raw --profile=Win7SP1x86 mftparser --output=body --output-file=mft.csv

mactime -b mft.csv -d -z UTC-5 > mft2.csv

When analyzing the timeline data, I start with what I already know and pivot from that data. In this case, I know about two directories and some files that are directly involved with this malware. So I will start by grepping that material and looking around the same time frames for other suspicious data.

In this demonstration, svchosts.exe in the local temp directory stood out. Now that you have another file of interest you can do a lot of the same things we have already shown to extract it out and learn more about it. In a real case I am looking for what activity occurred right before, so I can understand what may have been the infection point. For instance if I saw a lot of browsing, than it may be a good idea to check out internet history, if I saw a prefetch file entry for java maybe I would look around for an idx file that could show me more. Additionally I am looking for other items that may indicate what the attacker has done once on the box. Here we see that there are key logs being stored, but in more advanced cases where an attacker manages to get a shell, we may see evidence of the tools the attacker was using. @jackcr had a recent post on this that goes into further details on that topic.

Wrap it up with Yara

I could really go into more details on other parts of this analysis but as this is already a very long article I should probably wrap it up. Part of the reason for writing this post is so that if others came across a DarkComet memory sample they could get to the data quicker than I did. To help along with this, the following Yara rules may prove useful:

rule DarkComet_Config_Artifacts_Memory

{   

     meta:

           Description = "Looks for configuration artifacts from DarkComet. Works with memory dump and unpacked samples."

           filetype = "MemoryDump"         

           Author = "Ian Ahl @TekDefese"

           Date = "12-19-2013"

     strings:

           $s0 = "GENCODE={" ascii

           $s1 = "MELT={" ascii

           $s2 = "COMBOPATH={" ascii

           $s3 = "NETDATA={" ascii

           $s4 = "PERSINST={" ascii

     condition:

           2 of them

}

 

rule DarkComet_Default_Mutex_Memory

{   

     meta:

           Description = "Looks for default DarkComet mutexs"

           filetype = "MemoryDump"              

           Author = "Ian Ahl @TekDefese"

           Date = "12-20-2013"

     strings:

           $s = "DC_MUTEX-" ascii nocase

     condition:

           any of them

}

 

rule DarkComet_Keylogs_Memory

{   

     meta:

           Description = "Looks for key log artifacts"

           filetype = "MemoryDump"              

           Author = "Ian Ahl @TekDefese"

           Date = "12-20-2013"

     strings:

           $s0 = "[<-]"

           $s1 = ":: Clipboard Change :"

           $s2 = "[LEFT]"

           $s4 = "[RIGHT]"

           $s5 = "[UP]"

           $s6 = "[DOWN]"

           $s7 = "[DEL]"

           $s8 = /::.{1,100}\(\d{1,2}:\d{1,2}:\d{1,2}\s\w{2}\)/  

     condition:

           any of them

}

Wednesday
Dec112013

Automater Output Format and Modifications

Our recent post on the extensibility of Automater called for a few more posts discussing other options that the program has available. Particularly, we want to show off some different output options that Automater provides and discuss the sites.xml modifications that provide different output formatting. Please read the extensibility article to get caught up with sites.xml modifications if you are not aware of the options provided with that configuration file.

Automater offers a few possibilities for printouts outside of the standard output (screen-based output) that most users are aware of. By running:

python Automater.py 1.1.1.1 –o output.txt

We tell Automater to run against target 1.1.1.1 and to create a text file named output.txt within the current directory. You can see here, that after Automater does its work and lays out the standard report information to the screen, it also tells you that it has created the text file that you have requested.

Once opened, it is quite obvious that this is the standard output format that you see on your screen now saved to a text file format for storage and further use later.

While this text format is useful, we thought it would be better to provide the capability to provide a csv format as well as something that would render in a browser. To retrieve a csv formatted report, you would use the –c command line switch and to retrieve an html formatted report, you would use the –w command line switch. These options can all be run together, so if we ran the command:

python Automater.py 1.1.1.1 –o output.txt –c output.csv –w output.html

We would receive 3 different reports other than the standard screen reporting – 1 standard text file, 1 comma-seperated text file, and 1 html formatted file. Each of the reports are different and can be utilized based on your requirements.

Since we’ve already seen the text file, I wanted to show you the layout of the HTML and comma-separated outputs. Below you can see them, and I think you’ll find each of these quite useful for your research endevours.

You will notice that I’ve called out a specific “column” in each of the files that is marked with the header “Source” in each. This is where the modification of the sites.xml file comes into play. Again, if you need to take a look at how to use sites.xml file for adding other sites and modifying output functionality, please see this article. But for now, let’s take a look at what we can do with changing the html and comma-separated report format functionality by changing one simple entry in the sites.xml file. Below, you can see a good look at the robtex.com site element information within the config file. It is obviously here that we want to modify this scenario, since both of our outputs have RobTex DNS written out in the Source “column.” Looking at the sites.xml file we can easily see that this entry must be defined within the <sitefriendlyname> XML element.

Let’s change our sites.xml file to show how modifying the <sitefriendlyname> XML element can change our report ouput. We will change the <entry> element within the <sitefriendlyname> element to say “Changed Here” as seen below:

Now we will run Automater again with the same command line as before:

python Automater.py 1.1.1.1 –o output.txt –c output.csv –w output.html

And we’ll take a look again at our output.csv and output.html files. Notice that the Source “column” information has been changed to represent what you want to see based on the sites.xml configuration file.

As you’ll see when you inspect the sites.xml format, you can change these <entry> elements within the <sitefriendlyname> elements for each regular expression that you are looking for on those sites that have multiple entries. This allows you to change the Source output string in the file based on specific findings. For instance, if you look at the default sites.xml file that we provide you at GitHub you will find that our VirusTotal sites have multiple entries for the Source string to be reported. This allows you full autonomy in reporting information PER FINDING (regex) so that your results are easily read and understood by you and your team.

Tuesday
Dec102013

The Extensibility of Automater

With the recent release of version 2.0 of Automater, we hoped to significantly save some of your time by being able to use the tool as a sort of one-stop-shop for that first stage of analysis. The code as provided on GitHub will certainly accomplish that, since we have provided the ability for the tool to utilize sites such as virustotal, robtex, alienvault, ipvoid, threatexpert and a slew of others.  However, our goal was to make this tool more of a framework for you to modify based on you or your team’s needs. 1aN0rmus posted a video (audio is really ow ... sorry) on that capability, but we wanted to provide an article on the functionality to help you get the tool working based on your requirements.

One of the steps in the version upgrade was to ensure the Python code was easily modified if necessary, but truthfully our hope was to create the tool so that no modification to the code would be required. To accomplish this, we provided an XML configuration file called sites.xml with the release. We utilized XML because we thought it was a relatively universal file format that was easily understood, that could also be utilized for future web-based application of the tool. When creating the file, we made the layout purposefully simple and flat so that no major knowledge of XML was required. The following will discuss sites.xml manipulation where we will assume a new requirement for whois information.

Our scenario will be wrapped around the networksolutions.com site where we will gather a few things from their whois discovery tool. Our first step is to look in detail at the site and discover what we want to find from it when we run Automater. In this case, we determine that we want to retrieve the NetName, NetHandle, and Country that the tool lists based on the target we are researching. Notice also that we need to get the full URL that is required for our discovery, to include any querystrings etc…

Now that we know what we want to find each time we run Automater, all we have to do is create some regular expressions to find the information when the tool retrieves the site. I left the regexs purposely loose for readability here. See our various tutorials on Regex if you would like to learn more. In this case, we will use:


• NetName\:\s+.+
• NetHandle\:\s+.+
• Country\:\s+.+


which will grab the NetName, NetHandle, and Country labels as well as the information reported on the site. The more restrictive your regex is, the better your results will be. This is just an example, but once you have the regex you need to get the information you desire, you are ready to modify the sites.xml file and start pulling the new data.

Our first step will be to add a new XML <site> element by simply copying and pasting an entire <site> within the current sites.xml file. Since we need to add a new site to discover, we can easily just copy and paste an already established entry to utilize as a skeleton to work with. Just copy and paste from a <site> element entry to a closing </site> element. Since you’re adding the site, you can place it anywhere in the file, but in our case we will put it at the top of the file.

Once this is done, we need to modify the new entry with the changes that we currently know. Let’s come up with a common name that we can use. The <site> element’s “name” parameter is what the tool utilizes to find a specific site. This is what the tool uses when we send in the –s argument to the Automater program. For instance, let’s run python Automater.py 11.11.11.11 –s robtex_dns. Here you can see that Automater used the ip address 11.11.11.11 as the target, but it only did discovery on the robtex.com website. This was accomplished by using the –s parameter with the friendly name parameter.

 

We will use ns_whois for our friendly name and will continue to make modifications to our sites.xml file. We know that this site uses IP addresses as targets, so it will be an ip sitetype. A legal entry for the <sitetype> XML element is one of ip, md5, or hostname. If a site can be used for more than one of these, you can list extras in each <entry> XML element. (You can see an example of this in use in the standard sites.xml file in the Fortinet categorization site entry.) We also know that the parent domain URL is http://networksolutions.com. The <domainurl> XML element is not functionally used, but will be in later versions, so just list the parent domain URL.  With this information, we can modify quite a bit of our file as shown.

 

Now let’s move down the file to the regex entries since we know this information, as well as the Full URL information. In the <regex> XML element, we list one regex per <entry> XML element. In this case, we want to find three separate pieces of information with our already defined regex definitions so we will have three <entry> elements within our <regex> element. We also know our Full URL information based on the networksolutions site we visited and this information is placed in the <fullurl> XML element. However, we can’t list the ip address as we found in the Full URL information because that would not allow the tool to change the target based on your requirements. Therefore whenever a target IP address, MD5 hash or hostname is needed in a querystring, or within any post data, you must use the keyword %TARGET%. Automater will replace this text with the target required – in this case 11.11.11.11. Now we have the Full URL and regex entries of:


• http://www.networksolutions.com/whois/results.jsp?ip=%TARGET%
• NetName\:\s+.+
• NetHandle\:\s+.+
• Country\:\s+.+


A requirement of Automater is that the <reportstringforresult>, <sitefriendlyname> and <importantproperty> XML elements have the same number of <entry> elements as our <regex> XML elements – which in this case is three. This “same number of <entry> elements” requirement is true for all sites other than a site requiring a certain post. I will post another document discussing that later. For now, we will just copy the current reportstringforresult, sitefriendlyname, and importantproperty entries a couple of times and leave the current information there so you can see what happens. Then we’ll modify that based on your  assumed requirements.
Our new site entry in the sites.xml file currently looks like the following:

 

Here you can see the use of the %TARGET% keyword in the <fullurl> element as well as the new <regex> element regex entries. You can also see that I just copied the <sitefriendlyname> and <reportstringforresult> element information from the robtex entry that we copied and pasted. We did the same for the <importantproperty> XML element, but the entries here will be “Results” most of the time. I will post more on what this field allows later. Let’s take a look at running Automater with the current information in the sites.xml file and ensure we only use the networksolutions site by using the –s argument as before with the new ns_whois friendly name as the argument. Our call will be:


python Automater.py 11.11.11.11 –s ns_whois

Once we run this command we receive the following information:

 

Notice that the <reportstringforresult> element is shown with the report string. Also notice the %TARGET% keyword has been replaced with the target address. Now we need to change the <reportstringforresult> element so that we can get a better report string for each entry. In this case, we will change the report strings to [+] WHOIS for each entry just to show the change. We will also change the <sitefriendlyname> element to NetName, NetHandle, and Country so that they are correct. The <sitefriendlyname> element is used in the other reporting capabilities (web and csv). I will post something on that later as well. For now change your sites.xml <reportstringforresult> entries and then see what your report looks like! Should look something like the following screenshot, except that in my case I have also added a few more <entry>'s.

Hopefully this helps you understand how extensible the Automater application is now. Simple modifications to the sites.xml will give you the ability to collect massive information from multiple sites based on what you or your team needs with no Python changes required. Let us know.

Wednesday
Dec042013

Finally the new Automater release is out!

With the exception of my review of the Volatility Malware and Memory Forensics class yesterday, it has been a while since I have posted here. Time for me to get back into the swing of things. The best way to do so is with a new release to the tool that really launched code development projects on TekDefense.

Automater is a tool that I orginially created to automate the OSINT analysis of IP addresses. It quickly grew and became a tool to do analysis of IP Addresses, URLs, and Hashes. Unfortunately though, this was my first python project and I made a lot of mistakes, and as the project grew it bacame VERY hard for me to maintain. 

Luckily, a mentor and friend of mine (@jameshub3r) offered his time and expertise to do an enitre re-write of the code that would focus on a modular extensible framework. The new code hits the mark as far as that is concerned. The real power of Automater is how easy it is to modify what sources are checked and what data is taken from them without having to modify the python code. To modify sources simply open up the sites.xml file and modify away. I'll do another post later that goes into more detail there.

To view a bit more about installation and usage head over to the new Automater page.

You can download the code directly on Github. Remeber Automater is not a single file anymore, you need to download all of the files in the Automater repo to the same directory. To the first person that reports a valid bug to me, I'll send you a random game on Steam.

Here are a few screenshots to hold you over until you get it running.