Monday, October 17, 2022

Researching golang malware and how I hate security industry naming conventions - Part 1


While doing some research on the use of golang in malware, I came across this golang sample here:
At the time of writing this, it has 3/72 detections.

It's named winnta.exe
MD5 7e17c9e4ebe61e43966e9f65e334727e
SHA-1 2172901d2a13304dd53a83e518fd5be84ed9ec08
SHA-256 020f6b3e045fa6b968226a8f2b2800dc55c65e842607d04d68b47ef4d18b0eee

Here are the results for yara scan:

It connects to 195.149.87[.]87:443 ( )

The data above doesn't really tell us much about the sample.

Running strings on the sample is kinda interesting, it shows the following:
-ldflags="-s -w -extldflags '-static' -X -X main.addr= -X main.service=WindowsNTApp"

Also there are a bunch of references to go files and code in this directory: "/home/builder18g/goroot/" for example:

Looking for 'main.' shows the following:

At this point, assumption is this is some kinda backdoor.

Researching the C2 IP, I found this:

Here's what Mandiant has to say about this:
"UNC961 deployed two previously unobserved backdoors: HOLEDOOR and DARKDOOR. HOLEDOOR is written in C, whereas DARKDOOR is written in Go."

"DARKDOOR is a backdoor written in Go that is highly modular in design. It supports communication over TLS and HTTP. It has capabilities to execute arbitrary code and list running processes."

I continued to do more research to see what else I can find out about this sample doing any reverse engineering.

Searching for winnta.exe results in a paper from SentinelOne (

They mention the backdoor under section "Mercenary APT Groups Targeting the Financial Services Industry"

Here's what they had to say about it:
File: winntaWindows EXE written in golang that calls out to
Backdoor functionality
Potential name “gsh”

File: main
Golang compiled EXE of same “gsh” family as mentioned
for winnta
Calls out to 198.199.104[.]97:443 and has backdoor

Under potential attribution section, they refer to an RSA report regarding Carbanak."Carbanak has been reported using a custom golang backdoor named GOTROJ. This backdoor
has code overlap and functionality similar to the “geodezine” backdoor discovered in the attacker’s toolkit. "

The RSA paper "The Shadows of Ghosts" can be found here:

Here's a snippet about GOTROJ from the paper:
"On D+30, the attackers installed a Windows Trojan, written in Go, as a Windows
Service on one of the two primary Active Directory Domain Controllers. They
would move to utilizing the GOTROJ as their primary method of ingress for the
duration of the engagement. The GOTROJ Trojan communicated with C2 IP
address over TCP/443 for its remote access channel."

I continued doing more research. I searched for "WindowsNTApp" service, which came up with a crowdstrike report on ProphetSpider. The report is here:

They also refer to the malware as GOTROJ. This is what they say:
"The adversary commonly creates Windows services, e.g. WindowsNTApp for GOTROJ, to establish persistence for downloaded malware."

I finally searched for GOTROJ to see if there is anything else I can find. I only found one article here (besides ones I already looked at): where they found GOTROJ along side with Ragnarlocker infection.

I downloaded some of the samples to look at. I recommend this Ghidra extension:

The samples look similar and functions kinda look like this (different samples will have slightly different names and code):

The golang build path's embedded inside the samples are different, likely because each sample was probably compiled on different hosts likely by different people. Interestingly, the sample I started with is the only sample that has the build string in it with ldflags.

Other reports have already analyzed the capabilities of this golang backdoor but you can see from the function names what it can do as well.

Researching golang malware and how I hate security industry naming conventions - Part 2

I did some string searches in Hybrid-Analysis as well to look for more files. (Thanks Hybrid-Analysis for a researcher account!) I finally ended up with this yara rule (i'll learn to write better rules one day):

rule gsh_backdoor
        $a = "startInteractive"
        $b = "main.winService"
        $c = "main.(*winService).Start"
        ($a and $b and $c) and filesize < 6MB and filesize > 2MB
and uint16(0) == 0x5A4D

Searching that on Hybrid-Analysis results in the following hashes:

You can also run strings on the file and extract C2 information by doing egrep:
strings -f * | egrep '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?):(\d{0,5})'

18077efa0c23e9370eb95ca6c5ece82bcf61e63505a87aea8cb6a14d15500a8c.bin.sample: 142.93.213[.]221:21
55320dcb7e9e96d2723176c22483a81d47887c4c6ddf063dbf72b3bea5b279e3.bin.sample: 107.181.246[.]146:443
57a45d3010d74cbd089cacf23bc0f68eaa3fb8dc5479dbe8ed8e19004badfdb6.bin.sample: 198.199.104[.]97:443
95c6d0d4e619334b3d8adb5340198c420f78f937f3dc944bc12a2be7f73fb952.bin.sample: 64.227.88[.]98:443
9d42c2b6a10866842cbb6ab455ee2c3108e79fecbffb72eaf13f05215a826765.bin.sample: 107.181.246[.]146:443
b63ea16d5187c1fa52a8a20c3fd7b407033bcd4142addb1ce91923d6b2f19555.bin.sample: 45.76.236[.]136:443
winnta.bin: 195.149.87[.]87:443

One more thing I noticed while researching this is mention of "geodezine" backdoor. Some of the samples connect to the same C2 server as the golang backdoor connects to. I haven't looked too much into it but here's a rule:

rule geo_backdoor
        $a = "geodezine"
        $b = "cmd.exe"
        $c = "URLDownloadToFilDeleteUrlCacheEn"
        ($a and $b and $c) and filesize < 100KB and uint16(0) == 0x5A4D

And here are the hashes that show up on Hybrid-Analysis:

Summary of where this Golang malware shows up and timeline:
December 2017
The Shadows of Ghosts Inside the response of a unique Carbanak intrusion
Filename: ctlmon.exe
Malware name: GOTROJ
C2: 107.181.246[.]146
Attack involved exploitation of CVE-2017-5638

May 2021
Mercenary APT Groups Targeting the Financial Services Industry
Filename: winnta / main
Malware name: GOTROJ-related / gsh
C2: 45.76.236[.]136, 198.199.104[.]97
"cyber mercenary attack targeting a major US-based financial services organization"

August 2021
PROPHET SPIDER Exploits Oracle WebLogic to Facilitate Ransomware Activity
Filename: winnta
Malware name: GOTROJ
exploitation of CVE-2020-14882 and CVE-2020-14750

March 2022
Forged in Fire: A Survey of MobileIron Log4Shell Exploitation
Malware name: DARKDOOR
C2: 162.33.178[.]149, 195.149.87[.]87
Attributed to UNC961 and related to exploitation of Log4j in Horizon and MobileIron

April 2022
Ragnarlocker Ransomware IOCs
Filename: ctlmon.exe
Malware name: GOTROJ
C2: 45.63.89[.]250
Related to breach involving Ragnarlocker according to the post

September 2022
This is the sample that I started out my research with
Filename: winnta.exe
Hash: 020f6b3e045fa6b968226a8f2b2800dc55c65e842607d04d68b47ef4d18b0eee
C2: 195.149.87[.]87
I just found the sample. I'm not sure what campaign it's related to or any other details. The C2 matches the Mandiant report though.

You should be able to pivot from C2 to sample hash or sample hash to C2 using VirusTotal. Some vendors didn't supply C2s or hashes.

As far as I know, I have not seen any of these samples running and successfully connecting to C2 in any of the public sandboxes. I haven't seen results in Shodan or Censys that show the C2 port open even with historical search for the September 2022 sample.

There may be more samples on VirusTotal but I'm doing this independently and don't have access to VT.

I'm not a CTI person. To me this looks like a golang backdoor used by multiple actors. I just hope this post helps anyone Googling things because this sample has been called different things by different vendors and that's annoying. 

Saturday, October 15, 2022

Looking at process relationships from malware sandbox execution data


This blog post discusses looking at process relationships, specifically from malware sandbox execution data. One of the essential functions of malware sandbox is to gather and display process execution, for example if winword launches powershell, you'd want to know that.

One issue that I run into while doing research is that many of the public/free malware sandboxes don't allow me to search based on process relationships. For example, if I have a sample on an endpoint that executed whoami, nslookup, systeminfo, i would like to be able to search sandbox reports to see which malware families or samples do that.

The other thing I'm interested in as a researcher is trends a long for initial access execution, for specific malware families or in general. One of the twitter accounts I follow is and they post information about how malware such as qakbot is doing command/process execution on a system. 

I find the information interesting and obviously, the threat actors have made changes over time. Maybe the threat actors are using new LOLBINs more than before.

The final thing about process relationship data that can be useful is just looking for new things or rare executions. If you're collecting the data, you can do searches to look for rare executions.

All this research should be helpful with detection engineering too or with emulation, if you're trying to match a specific threat.

POC Implementation:

As a proof-of-concept, I decided to implement a searchable database that lets me collect data from malware sandbox report and lets me search for parent-child process relationships. 

I acquired my data from Hybrid-Analysis Public Feed, which gives you JSON file with around 250 recent malware analysis results. I also got data from Zero2Auto CAPE sandbox ( Thanks for letting me use the data!)

I initially looked at graph databases but asking graph database questions/doing queries seemed annoying to me so I didn't look into them too much.

The second thing I tried was to join data from process execution in Python manually, which was a horrible idea. The code turned out horrible and dataset wasn't fun to work with. (

CAPE and Hybrid-Analysis both record process execution data differently but one thing they have in common is a process list json object. Each process object has process metadata and parent process id and obviously the process id. 

I decided to use duckdb to analyze the data. (Usually I'd use sqlite but wanted to try out duckdb and it worked fine)

I created a table with:

  • report id - specific execution task/detonation in the sandbox
  • process id
  • parent process id
  • process name
  • process path
  • process command line

Then I loaded the results from CAPE or Hybrid-Analysis to the table. I'm loading the same type of data but parsing their json reports is obviously different.

Finally, I created a view with join, where I ensure that report id is the same and parent process id and process id's match.

The resulting view contains:

  • Report ID
  • Parent Process ID
  • Parent Parent Process ID
  • Parent Name
  • Parent Path
  • Parent Command Line
  • Process ID
  • Process Name
  • Process Path
  • Process Command Line

Gathering data from the sandbox reports and putting it in the database allows me to ask questions like these:

  • what process launched ping? select parent_name, proc_commandline from joined_proc_list where proc_commandline ilike '%ping.exe%';
  • what process launched powershell with command line to add Defender exclusion? select parent_name, proc_commandline from joined_proc_list where proc_commandline ilike '%add-mppreference%';
  • what processes launch wscript? select parent_name, count(*) as count from joined_proc_list where proc_commandline ilike '%wscript%' group by parent_name
  • what does cmd.exe launch from the appdata folder? select parent_name, proc_name from joined_proc_list where parent_name ilike '%cmd.exe%' AND proc_path ilike '%appdata%';

The results look kinda like this:

If you have large enough dataset, you can extract more info like malware or campaign name and etc and keep track of the trends.

Other solutions:

If you already are doing malware execution in your sandbox, you can check if you are able to search based on process relationships. 

You could also have a backend database that you can query, for example MongoDB or Elasticsearch, although I personally don't know about join capabilities of those databases.

Alternatively, if your sandbox supports either pushing data out to splunk or elasticsearch or any other place, you could try to work with that data. You can also maybe intercept that data and send it to a webhook or lambda for additional processing.

If you have a system that supports pulling data, maybe through an API, that's also a solution. Maybe have a script that pulls reports, parses data, and processes it.

You can store processed data in whatever database you feel comfortable utilizing. I would personally use Clickhouse or Postgresql if I was doing this. 



Also check out Grapl -