Saturday, April 15, 2017

Distributing binary processing using Python-rq

I highly recommend looking at this first:

In this post, I’ll cover doing distributed binary analysis with python-rq, redis, and NFS share. I’ll keep the analysis really really simple. We’ll be getting some other PE info. If you look at Resources/Similar Projects section. You’ll find similar and more stable projects as well.

I have bunch of files I need to analyze and I want to distribute analysis amongst different servers.

As mentioned in previous blog post linked above, I’ll be using Python-rq. Python-rq works with Redis. I’ll also be utilizing NFS for sharing files across multiple worker nodes and pefile for getting PE information.

I have three machines with Ubuntu 10.04 on them. Two machines are Workers and one machine is Redis server and a job creator. Additionally, I have NFS server where I will have my binary files and python scripts.

NFS Server - (Using Freenas)

Install screen or tmux on everything!

We’ll need to install required software on DIST.
I ran:
apt-get update && apt-get install -y python-pip redis-server && pip install rq rq-dashboard

And to confirm everything was installed:
redis-server -v
rq info

Edit Redis-server configuration and make it bind to
Config file: /etc/redis/redis.conf
Locate bind and change it

Run the following to restart redis-server:
service redis-server restart

We will also start rq-dashboard on Dist.
Run the following command in a screen session:
rq-dashboard -H

It should start a webserver on port 9181.

On Worker1 and Worker2 we’ll need python-pip and rq. Tip: If you’re using VM’s or cloud, just configure one worker node then deploy copies.
I ran:
apt-get update && apt-get install -y python-pip && pip install rq

And to confirm rq was installed correctly:
rqworker -u redis://
Now we’ll get NFS share setup.
On worker nodes and dist, I created /share directory where I’ll mount the NFS share.
Also, I forgot this earlier. We need NFS client.
Run the following on worker nodes and dist:
apt-get install -y nfs-common

To mount the NFS file system run (if you’re going to be setting up something stable, edit your fstab file):
mount -t nfs /share

Run the following to make sure NFS share permissions are set correctly:
echo test > /share/test
cat /share/test

On my worker nodes, I need pefile library. I ran:
pip install pefile

I created /share/malware folder. I’ll put my samples in that folder. If you’re looking for samples to play with check out: :-D

Now we can start writing code to do the analysis.

I recommend installing ipython on one of your worker nodes and dist node. It makes debugging and testing much easier.

I will be putting my code in /share. We will be running rqworker in that directory as well. Putting code on NFS server makes updating easier.

We’ll need code for processing part and job distribution part.

For processing, I have a function that takes in full path to the PE file and looks at some properties and writes the results to a text file. Code I used is from pefile example document (linked below).
My file name is and function for processing the files is procpe(FILENAME).

Now we can work on job distribution code. My code is really simple for now. I’ll take full file path as argument, add it to queue, and not wait for results. (procpe returns True and False)

We need to cd into /share and run rqworker command (you can run rqworker multiple times on the same machine):
rqworker -u redis://

On dist, we can cd into /share as well and run the following:
for file in /share/malware/*; do python $file; done

We can see our jobs in rq-dashboard.
After everything is done being processed, we can see results:

This was a good way for me to learn to distribute things using Python. I’m aware of some of the Apache projects that do stuff like this as well. I feel more comfortable with Python right now. Also, setup for this is really simple.

As you can tell already. I’m not a professional programmer and I’m lazy. There are several problems with this design too. I think I can add multithreading or multiprocessing to make workers process more data. Code is on github, feel free to fork and improve it. If you do, please leave a comment.

I do plan on running this at my university and adding more features. I'll keep updating the code on Github.

If there are alternative methods or I made some mistakes in the post, leave a comment.
(As you can see from the timestamps in screenshots, I wrote this at night. Expect typos)


Resources/Similar Projects:


  1. This expert filters through this information with the objective of spotting data, bits of knowledge, patterns, arrangements and so forth. data science course in pune

  2. I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it.
    business analytics course in delhi

  3. Attend The Machine Learning course in Bangalore From ExcelR. Practical Machine Learning course in Bangalore Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Machine Learning course in Bangalore.
    Machine Learning course in Bangalore

  4. Thank you so much for helping me out to find the Data analytics course in Mumbai Organisations and introducing reputed stalwarts in the industry dealing with data analyzing & assorting it in a structured and precise manner. Keep up the good work. Looking forward to view more from you.

  5. Such a very useful article. I have learn some new information.thanks for sharing.
    data scientist course in mumbai


  6. Excelr is providing emerging & trending technology training, such as for data science, Machine learning, Artificial Intelligence, AWS, Tableau, Digital Marketing. Excelr is standing as a leader in providing quality training on top demanding technologies in 2019. Excelr`s versatile training is making a huge difference all across the globe. Enable ?business analytics? skills in you, and the trainers who were delivering training on these are industry stalwarts. Get certification on "
    data science course fees in hyderabad"
    and get trained with Excelr.

  7. Great post i must say and thanks for the information. Education is definitely a sticky subject. However, is still among the leading topics of our time. I appreciate your post and look forward to more.
    Data science course in mumbai

  8. Nice Blog...Very interesting to read this article. I have learn some new information.thanks for sharing.
    ExcelR Mumbai

  9. This is also a very good post which I really enjoyed reading. It is not every day that I have the possibility to see something like this..
    ExcelR data science

  10. Very nice blog here and thanks for post it.. Keep blogging...
    ExcelR data science training

  11. A good blog always comes-up with new and exciting information and while reading I have feel that this blog is really have all those quality that qualify a blog to be a one.
    data analytics courses in hyderabad

  12. I am looking for and I love to post a comment that "The content of your post is awesome" Great work!
    data science course in mumbai

  13. Awesome blog. I enjoyed reading your articles. This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work!
    data science course in mumbai

  14. This is also a very good post which I really enjoyed reading. It is not every day that I have the possibility to see something like this..
    data analytics course hyderabad

  15. Nice information, valuable and excellent work, as share good stuff with good ideas and concepts, lots of great information and inspiration, both of which I need, thanks to offer such a helpful information here. data science course

  16. Excellent Blog! I would like to thank for the efforts you have made in writing this post. I am hoping the same best work from you in the future as well. I wanted to thank you for this websites! Thanks for sharing. Great websites!
    Please check ExcelR Data Science Course in Pune with Placement

  17. Very nice blog and articles. I am realy very happy to visit your blog. Now I am found which I actually want. I check your blog everyday and try to learn something from your blog. Thank you and waiting for your new post.
    ExcelR Data Analytics Course

  18. It is imperative that we read blog post very carefully. I am already done it and find that this post is really amazing. ExcelR Data Analytics Course In Pune

  19. Wow what a Great Information about World Day its exceptionally pleasant educational post. a debt of gratitude is in order for the post.Please check ExcelR Data Science Courses

  20. I need to to thank you for this very good read!! I definitely loved every little bit of it. I have you bookmarked to check out new things you post… data science course bangalore

  21. keep up the good work. this is an Ossam post. This is to helpful, i have read here all post. i am impressed. thank you. this is our data analytics course in mumbai
    data analytics course in mumbai |

  22. I have to search sites with relevant information on given topic and provide them to teacher our opinion and the article.

    machine learning course

    artificial intelligence course in mumbai