Monday, May 22, 2017

Linux botnet malware analysis: part 2

Part 1:
Part 2:
Part 3:

This post will probably have some technical mistakes due to lack of skillz. If you spot any, leave a comment.

Binary analysis in this case is analysis of executable binary file. Generally there are two types of analysis static and dynamic. Static can simply be defined as analysis without executing the file and dynamic is when file is executed.
Static analysis can include analyzing strings, binary properties, function usage/import, disassembly, and other file properties. Dynamic analysis can include analyzing the malware behavior while it’s running. Behavior such as resource usage, network communications, system changes, and etc. Dynamic analysis can also involve using a debugger or some tracing tools.

The sampled analyzed in this research is called “qbot” and the source code of it was released a while ago.

Static Analysis:
File type - In order to understand what’s being examined, file command can be used to identify file type. File utility uses magic headers and also parses file structure to determine what type of file a provided file is. For ELF files (Linux Executable) the file command can give information regarding whether the file is statically compiled or not, if the symbols are stripped or, and what architecture the file runs on.
Strings - Looking at strings often helps with the analysis. On Linux strings command can be used to examine printable characters. This command can sometimes identify function names, IP addresses, data that’s printed, and other information from the binary file.
Disassembly and decompilation - Source code is turned into binary using a compiler. When code is disassembled, you see what the compiler thought would be the best way to run the code on a system. The binary cannot be taken and turned into source code again. You can however use decompilation to basically get close to what the original source code logic would have looked like.
There are more static analysis tools that were not mentioned here but links to them is provided in Resources section.

My analysis started with downloading a shell script that was being downloaded to my honeypot. The shell script contained download commands to download more files. I downloaded these files and ran the file command on them. You can see the results below. (Ignore desktop.ini)
First thing you’ll notice is that they are ELF files obviously. You can see that the files are compiled for many different architectures. This increases the chances of the malware being more successful at spreading. The shell script executed all the files. If the file doesn’t fit the architecture, it will just fail to execute. Since it’s compiled for so many architectures, chances are, at least one of the file will execute.
Another thing to notice is that the files are statically linked. This means that the system its executing on does not need to have library the malware is using. Notice that the files are not stripped either. The symbols are still part of the program. I don’t know why this was done but the code is now open source anyway so it doesn’t really matter.

Above is the output from the strings tool. I censored out IPs and file names.
The shell command starting at cd and ending at rm -rf * is what was executed on my honeypot. With strings, you can also see (:-D) an IP and a PORT. That’s probably the C2 server but we’ll confirm after more analysis. You can also see a username and password list while running strings command. This is probably the list it uses to brute force machines before running the command in the screenshot above. The command basically downloads the shell scripts that needs to be ran, via various methods such as http, ftp, and tftp, then executes the scripts.
After looking at more output, you can see the full list of usernames and passwords.

Next I started using decompilers and disassemblers. The tools I used are Hopper, IDA, and RetDec.

Above is the screenshot of the code that’s executed first. There are multiple interesting functions being called. First one is printf with getBuild, getBuild function returns “DONGS” as seen below:
Prctl is also being called. Prctl function is used for interacting with processes. The sample changes process name to “/usr/sbin/dropbear.” getOurIP function gets the IP address of the machine its running on. After that’s done, fork is called to start a new process.
At this point, I switched to RetDec because the output from it was easier for me to understand.
Above, you can see the sample attempts to make a connection to C2 server. If the connection is successful, it sends BUILD DONGS to the C2 server.
PING/PONG messages are exchanged between C2 and the client.

After successful connection, the bot/client is ready to receive commands. We were running the sample in a container so we had some traffic to analyze. When we looked at the traffic, we saw that the commands being sent had !* in the front. First command sent was !* SCANNER ON, as soon as the bot connected.
In IDA, you can also analyze strings, in the screenshot below, you can see some of the commands and reply for some of the commands.

Back in Hopper, you can see decompiled functions for some of the commands.

When analyzing strings, we saw Linux command that downloaded and executed the malware, IP:PORT, and potential usernames and passwords. In IDA, we can confirm that.
Above you can see that commServer (IP:PORT) is being used in InitConnection.

And here’s the list of usernames and passwords the malware tries.

These lists are utilized in StartTheLelz function. This function is using them to brute force machines then executing infectline string (download and exec commands).

I had to stop here and start using dynamic analysis. Dynamic analysis for me is a lot easier.

Dynamic Analysis:
Tracing tools - Tracing tools such as strace and ltrace can be used to trace system or library calls being made.
Instrumentation - Instrumentation tools can be used to trace function calls by the sample you’re analysing. You can also use instrumentation tool to modify the arguments in these calls. Frida is one such tool.
LD_Preload - LD_Preload trick allows an analyst to load their own library which can trace function calls as well. You’re essentially intercepting calls and you can also modify return values. It doesn’t work when the sample you’re analyzing is statically compiled.
Debugger - It disassembles the code. While running the code, you can track what instructions are being called and monitor the execution flow. It also allows you to modify registry or data in memory to change behavior of the execution. You can also edit the instructions if you need to.

We can use dynamic analysis to validate some of our findings from static analysis. We can either connect to C2 while doing dynamic analysis and just watch what commands C2 sends and see what they do or we can set up our own C2 and interact with the malware that way. The sample being analyzed here has a very simple C2 protocol. In this case, I just had to change my IP address to the IP address of the C2 server and use netcat to listen on the same port as the C2 server. In VirtualBox, you can do this by having NIC attached to the VM but setting the NIC settings to be “Not attached”.
I used the command ‘ncat -lkvp 513’ the sample was using port 513 for C2.
-l is used to listen
-k is used to keep listening. After a client disconnects, ncat keeps running and listening.
-v is verbose mode. Ncat notifies us every time the client connects.
-p 513 is used to specify port 513.

First I use tracing to examine the behavior. Since the binary is statically compiled. Ltrace did not work. Strace does work. Strace is ran with -f option to follow forks. The command used is “strace -f ./bash” (note: bash just what the malicious file is named) below you can see the execution being traced and connection being made to C2 server, Build DONGS and PING are also sent.

In the ncat window, I tried different commands.
When sending SCANNER, the client responds with SCANNER ON | OFF.
Sending SCANNER ON results in the client responding with PROBING.
Sending SCANNER OFF results in the client responding with REMOVING PROBE.

While analyzing the traffic from running the actual malware in container and letting it connect to C2, we noticed that there was a lot of traffic on port 23. When SCANNER ON is sent to the bot, it starts trying to brute force login on port 23. I quickly started up another ncat window and this time, I started listening on port 23.
Here’s what I saw:
It does connect to port 23.

Next thing I tried was the HTTP command to do GET request flood. Below you can see the results of HTTP connection.
Notice that it does GET /. Also notice that it attaches Referer site in the GET request as well. It’s to make the request appear like the user is coming from Google search.

While doing more static analysis on SCANNER part, I discovered the algorithm for how it does brute force.
Notice that it does readUntil “ogin:”.
I also looked at other strings around “ogin:”. Above you can see ssword:, ncorrect and sh as well.
When connecting to servers on port 23, the malware looks for ogin, ssword, ncorrect, and sh. Original code is probably some what similar to this:
If ogin:
Send username
If ssword:
Send password
If sh:
Send infectline
If ncorrect:

In order to test this dynamically, I found some example code for python socket server online and modified it a bit.
Here’s the modified code:
I’m sending Login, Password, and Incorrect. This should get the malware to loop through the username and password list we discovered during static analysis.
I ran the python socket server and sent SCANNER ON to the client. I dumped all the output from Python into a text file. Note that the python server was getting connected from all the threads the malware spun up. The log file didn’t look so pretty. Below is the screenshot of attempted usernames and passwords:

That confirms part the algorithm written above. Now I needed a successful login from the malware so I can confirm that it does indeed send the infectline. I used netcat for this. I echoed all the necessary replies when the malware connected. In the screenshot below, you can see that the malware tried root/root then it sent the infectline.

After that was successful, the malware sample actually sent credentials and IP back to the C2 server.

FUCKOFF command kills the bot process.

So far we’ve learned:
  1. Bot spreads via brute force on port 23, using the built-in dictionary
  2. Brute force algorithm
  3. C2 protocol and commands
  4. Capabilities of the bot

I got all the information I need from the (half-assed) analysis for the next part of my project. In the next post, I’ll cover monitoring, limitations and improvements, automation and future research, and conclusion.


The resources for this project were provided by the Living Lab at IUPUI.