Thursday, May 25, 2017

Linux botnet malware analysis: part 3

Part 1: http://www.boredhackerblog.info/2017/05/linux-botnet-malware-analysis-part-1.html
Part 2: http://www.boredhackerblog.info/2017/05/linux-botnet-malware-analysis-part-2.html
Part 3: http://www.boredhackerblog.info/2017/05/linux-botnet-malware-analysis-part-3.html

Monitoring:
Goal for monitoring was inspired by what MalwareTech has done in the past.
As I stated in the first part, I did want to monitor to see who got attacked by the botnet we were currently researching. While being connected to C2, we did not observe any attack commands being sent.
There are couple of ways you could monitor the specific botnet we were looking at. One way is to just have the client/bot connect to the C2 server and observe the network traffic to see who is getting attacks. In this case, you may want to reduce the network speed to not have large impact. This way is still bad. Another way is to use instrumentation and intercept the C2 commands and write them to file, then modify the command before it’s passed to the actual function that processes the commands. The last way I could think of was to just write a fake bot. This would involve writing something that behaves exactly like the client/bot but doesn’t do all the bad stuff.
In case of the malware we were analyzing, writing a fake bot was really easy to do due to the simplicity of C2 protocol. Below is the example of the fake bot the specific malware we’re analysing:
When the command from C2 is received, the bot takes timestamp and the command and writes it out into a file so we can keep track of what happened and when it happened.

Limitations and Improvements:
Limitations in this research mostly came from lack of time and lack of skills. We only ran the malware for less than a month and we were not able to observe any actual DDoS attacks.
To improve the research, we could next time monitor multiple botnets to observe actual attacks. We could analyze the malware in more detail as well.

Automation and future research:
When we first ran our honeypots we received a lot of different samples. For automation, we would have to automatically download the malware samples, extract C2 information, and connect to C2 server with fake bot to observe attacks.
For future research, we will try to focus on implementing automation and hopefully see attacks being launched in real time and find out who the victims are.

Conclusion:
This was an interesting semester project. We were able to set up honeypots and get some common malware samples, figure out how they spread and how their brute force algorithm works, we were also able to figure out some of the commands that could be sent to the bots for attacks, and write a monitoring tool.

Resources:


The resources for this project were provided by the Living Lab at IUPUI.

Monday, May 22, 2017

Thursday, May 18, 2017

Linux botnet malware analysis: part 1

I came across my first Linux botnet malware when I setup a honeypot at work. I started analyzing it and it was really interesting. After that I started focusing on other work. I always wanted to see if I could track DDoS attacks but I never had the chance to further the project. This year, I was taking a NetSec class and decided that I should investigate botnet as my final project.
The analysis done here is a bit amateurish/half-assed. I was limited by time and skills. My goal was to determine what botnet infrastructure looks like, how it spreads, its capabilities, C2 protocol, and how to monitor it.
I worked with another person to do this. The data collection lasted about a month and we spent just a few weeks on analysis.


Background:
DDoS attacks are very commonly covered in the media/news now more than ever. Attacks are usually launched from botnets and that’s one reason to study botnets. Besides that, botnets are just interesting.
There are two basic components of a botnet. Command and control server and the victim/client. C2 is responsible for giving commands to the victim/client to launch attacks or do other things. Botnet can use different communication protocols, such as HTTP, IRC, and raw TCP socket connection.
Another thing required by the botnet malware is the ability to spread and infect more machines. Infection can rely on exploits or brute force. From what I’ve noticed from running honeypot is that the malware brute forces port 23/22 and runs a command, which downloads the sample malware and executes it.


I’ve seen download being done with curl, wget, ftp, and tftp.
In this research, we set up honeypots, acquired malware samples, and did analysis.


Honeypot Setup:
We utilized Modern Honey Network (MHN)  by ThreatStream for deploying and monitoring honeypots. MHN basically runs a central server that receives information from honeypot sensors. You can analyze what passwords were tried, where the attacks came from, and etc using it.
We were interested in anything that brute forces SSH so we deployed Kippo honeypots using MHN. Our original idea was to analyze bunch of samples/families so we deployed 7 honeypots but as soon as we started organizing the data, we realized that this would be too much so we decided to just focus on one family of malware and one group of samples.
Kippo saved logs of what commands were executed on the honeypots so we wrote a script to download those logs using sshpass (Kippo runs on port 22 and by default SSH was moved to port 2222 by MHN script):
sshpass -p ‘PASSWORD’ scp -r -P 2222 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null root@HP_IP_ADDRESS:/opt/kippo/log/* /root/logs/
After we obtained the logs we started looking through them. Since we were looking for command execution, we filtered the logs by looking for the word “executing command”. Now that we have list of commands executed, we noticed that some commands were irrelevant and we wanted lines in which download and execution occurred. We filtered again by looking for wget, curl, and ftp.
We can use the output from this and download the samples for analysis.


Various VPS providers were used to deploy these honeypots. Vultr was used to run MHN. Vultr is cheap, reliable, and user friendly.


Analysis Infrastructure setup:
Static and dynamic analysis will be covered in the next post.
For our research, we wanted to capture network some data to see who the victims of DDoS attacks are and information regarding the C2 protocol. We thought we would run the malware and then just filter for information using tshark.
Our goals when running the malware were that we wanted to capture the network traffic, avoid doing having a lot of impact when DDoS attacks were launched, and anonymize our IP address.
We utilized Proxmox, Tor router, Pfsense, and Linux containers.
In the picture above you can see the layout visualized. Proxmox allowed us to create containers and we created ubuntu containers. Traffic from the container is sent to pfsense, where we can do packet capture before the traffic goes to tor.
Proxmox containers have unique IP addresses assigned by pfsense so we can run multiple samples and filter by IP address to find out what sample is generating the specific traffic. Another benefit of proxmox is that it allows us to limit the network speed for the container. This allows us to avoid making a big impact in DDoS attacks but still monitor C2 commands.
Finally, the tor router we used was a physical one. Apparently, the latest version of pfsense does not have the tor package. This is okay. The router we used had a physical switch. When it’s switched to tor mode, nobody can access the configuration panel. To disable tor, you would literally have to be physically present and flip the switch.  


In the next post, I’ll cover binary analysis of the sample we decided to analyze.


Resources:
http://www.vultr.com/?ref=7127410 (FYI: this is an affiliate link, if you sign up using it, it puts credit in my Vultr account. You don’t have to sign up using it or even use Vultr if you find a better provider)

The resources for this project were provided by the Living Lab at IUPUI.

Tuesday, May 2, 2017

Forensics - TeamViewer file extraction

Introduction:
I was a TA in a forensics class during Spring. I created an extra-credit assignment for the class. The assignment is below:



I selected TeamViewer because I didn't see any tool to recover this file type.

The goal was to make students figure out the file structure for TeamViewer recording files, figure out what data to extract, and learn/apply programming skills.

Solution:
First thing to do is to discover what Teamviewer file format is like.
I downloaded some TVS files from the internet. Alternatively, you can create a teamviewer session and record it, then save it as well.

I opened them in a hex editor. 
Both files begin with TVS as header then there is some metadata then there is BEGIN. Metadata is information about the recording.
So our format so far is TVS (METADATA) BEGIN.

Both files have END and base64 string at the bottom. When carving automatically, it’s hard to figure out when the base64 string ends. It’s much easier to just look for TVS header and END footer. 

To see if I can open file that does NOT contain base64 after END, I modified the file and removed base64.

I saved the file as nobase64.tvs.

And opening and playing the modified file does work!

So information we have to look for and extract:
TVS (header) | metadata | BEGIN | DATA | END (footer)

I created a fake image by just dumping urandom + tvs file + urandom into one file. 


In python, I can open and read file in binary mode and start looking for hex data.
So now, I have to go through 0 to 361286618 and look for TVS, BEGIN, and END.
ASCII to HEX:
TVS = 54 56 53
BEGIN = 42 45 47 49 4e
END = 45 4e 44
We can look for TVS but we might see some false positives.


Instead we want to look for TVS 0x0d 0x0a.
Same with BEGIN
And END

Algorithm:
For EACH_BYTE in TOTAL_BYTES:
Look for HEADER:
Note address of the header
Look for BEGIN:
Note address of begin
Look for FOOTER:
Note address of footer
Extract HEADER to FOOTER & metadata between HEADER and BEGIN.

Implemented in Python:

disk_image = open('myimage','rb').read()
file_number = 1
for i in range(0, len(disk_image)):
   if (disk_image[i] == 'T') and (disk_image[i+1] == 'V') and (disk_image[i+2] == 'S') and (disk_image[i+3] == '\x0d') and (disk_image[i+4] == '\x0a'):
       header = i
       if (disk_image[i] == 'B') and (disk_image[i+1] == 'E') and (disk_image[i+2] == 'G') and (disk_image[i+3] == 'I') and (disk_image[i+3] == 'N') and (disk_image[i+4] == '\x0d') and (disk_image[i+5] == '\x0a') and (disk_image == 'K'):
           begin = i
           if (disk_image[i] == 'E') and (disk_image[i+1] == 'N') and (disk_image[i+2] == 'D') and (disk_image[i+3] == '\0d') and (disk_image[i+4] == '\0a'):
               footer = i+2
               outfile = open(str(file_number)+'.tvs', 'wb')
               outfile.write(disk_image[header:footer])
               outfile.close()
       file_number = file_number + 1


The implementation above uses tons of memory.
Using the script above would be inefficient. Regex library in python can be used to make this easier and more efficient.

Headers, begin, and footer are defined. Addresses for header, begin, and footer are extracted. TVS file is extracted and written. Metadata is extracted and written. CPU and Memory usage is low this time and extraction speed is much faster.

File is uncorrupted and plays with Teamviewer.

That's all. Hopefully that was useful to someone. I'll put the script on https://github.com/ITLivLab/TVS_extractor

Formatting is messed up because copying and pasting doesn't exactly work great between Word, Google Docs, and Blogger.