Wednesday, July 20, 2022

Screenshotting/scanning domains from certstream with littleshot to find interesting content


Certstream is a great service which provides updates from Certificate Transparency Log, which has info regarding certs being issued from several providers.

Certstream data has been used in the past for detection of malicious sites or phishing sites. There are several links in the resources section about certstream usage.

Littleshot is a tool similar to urlscan and urlquery(RIP) which I wrote a while ago because I wanted to be able to screenshot a ton of sites and collect metadata regarding them. (It's here: I realized having yara scan html body would be cool so I added that feature as well later on. There is also a branch that uses tor for connections. It's not the most optimized project and error handling isn't the best but it's good enough for my purposes.

You can also put newly registered domains through littleshot as well but I've decided not to do that for now.


- Take certstream domains and scan them with littleshot

- Utilize yara rules to look for interesting pages

- Send some metadata to Humio (littleshot by default doesn't do this) for either alerting, dashboarding, or just searching.

- Ensure that there is caching of domains from certstream to avoid rescanning domains

Tech stack:

I'm hosting everything on vultr. (Here's a ref link if you'd like to try vultr for your projects:

- Littleshot

-- caddy - reverse proxy

-- flask - webapp

-- redis - job queue

-- python-rq - job distribution/workers

-- mongodb - store json documents/metadata

-- minio - store screenshots

- Certstream + python - Im getting certstream domains and doing filtering and cache lookup with python

- Memcached - Caching. I wanna avoid scanning the same domain twice for a while so i'm using memcached


The diagram below shows the setup I have going.

I get data from certstream and I'm using some filtering to ensure that I don't scan certain domains.

Once the keyword based filtering is done, I check the domain against memcached to ensure that it hasn't been scanned before in the past 48 hours.

If the domain wasnt scanned in the past 48 hours, I queue to be scanned with littleshot.

When littleshot worker does the scan, it sends taskid, domain, title, and yara matches to Humio (besides just doing normal littleshot things).

Certstream_to_littleshot script -

Yara rules (these aren't the best. you should probably write your own based on your needs) -

Worker code to support sending data to Humio -

Interesting stuff I came across:

- Lots of wordpress and nextcloud/owncloud sites and general stuff people self-host
- Carding forum?

- Argo CI/CD without auth?
- Piracy site

No phishing sites or C2 with at least my yara rules.

Here are the yara hits in Humio (ignore abc,xyz, that was me testing Humio API):

What I would do differently with more time and resources (with this project and with littleshot):

- Better error handling - Current error handling is meh
- Get rid of mongodb and replace it with opensearch or graylog maybe? - Opensearch and graylog are great when it comes to searching.
- Potentially having a indicator list built into littleshot?
-- Currently tagging is based on yara rules but there are many ways to detect maliciousness, such as hash or URLs.
- Enrichment of data like urlscan does
- Better webui - the webui is pretty shit. idk enough html/css/javascript
- Better logging. There is logging of results but no logging of anything else (queries, crashes, etc...)
- Redirect detection & tagging. Some domains do redirect to legitimate login pages.

Resources & similar projects: - ninoseki github has really cool projects. This one is very similar to littleshot actually. - littleshot fork that someone hooked up with certstream. It has a refreshing page of screenshots too like urlscan.

(if the blog post formatting looks odd, it's because Blogger editor interface hates me)

Wednesday, July 13, 2022

Building a honeypot network with inetsim, suricata,, and appsmith

I wanted to learn a bit more about data engineering, databases, app building, managing systems, and so on so I decided to work on a small honeypot network as a project. I was partially inspired by Greynoise and AbuseIPDB, I use both of those a lot. I wanted to get this project done in about a week so this is a small project which isn't too scalable. I ended up learning things so it's fine.

My goals:

- Use Suricata to see what type of signatures are triggered based on the incoming traffic from the internet
- Save all the Suricata logs to disk in a central place so I can go back and search all the data or reingest the data.
- Send logs to Humio for searching, dashboarding, and potentially alerting purposes
- Have a webapp for searching for an IP
-- Webapp should show the signatures the IP has triggered, first time the IP was seen, last time the IP was seen, and number of times it was seen triggering signatures.

My tech stack:

- Sensors & databases are hosted on Vultr w/ Ubuntu
- Obviously Suricata for detecting attack attempt type
- Inetsim - this is not the best (i'm letting the attackers know I'm not running any real services, it's just inetsim, assuming attackers manually go look at the scan results) but it'll do for this project
- Zerotier - all sensors are connected to a zerotier network, it just makes networking, moving data around, and management easier
- - I'm using to move data around
- Humio - it's for log storage and search, just like ELK or Splunk
- rinetd - I'm actually not running inetsim on all the sensors, I'm just forwarding all the traffic from sensors to one host running inetsim (it's good enough for this project)
- Redis - pubsub. I'm putting alerts into redis and letting python grab them and put the data in postgresql
- Postgresql - to store malicious IP, signature, and timestamp
- Appsmith - to make webui app (usually i'd use flask...)


Network kinda looks like this w/ Zerotier:

Sensors are exposed to the internet, servers aren't. rinetd takes in sensor traffic from the internet and forwards it to inetsim. inetsim is bound to zerotier IP address.


The flow for logs kinda looks like this:

Vector on all the sensors reads eve.json, sends the data to vector on the ingest server.
Vector on the ingest server does multiple things. It'll save data to disk, send the data to humio, the alerts will get geoip info added, then it'll go to redis, python will ingest data from redis then put it into postgres.

postgres stores malicious IP, suricata signature, and timestamp.

Python script being used to process redis data and add data to postgres:


I used AppSmith for the webapp. AppSmith allows you to build a webapp and connect it to integrations it supports with little to no coding. 

For webapp, I just have an input field and some queries running based on the input. It looks like this:

What would I do different if I had more time and resources:
- I'd probably setup a more realistic honeypots or have honeypot profiles
- Put honeypot software on the sensor itself instead of doing rinetd
- Ship logs through the internet (not zerotier)
- Do geoip enrichment on the sensor itself
- Store alert data in opensearch or some cloud hosted database that I don't have to maintain?
- Add health monitoring for sensor, pipeline, etc..
- Better deployment and update (of software and suricata signatures) potentially through ansible?

There are probably many other things that can be done differently or more efficiently.