Saturday, January 7, 2017

Python-rq example

Introduction:
This post is about Python-RQ and job distribution. Python-RQ is a python library that utilizes redis queues to queue jobs. These jobs can be processed by multiple workers you have on the node. This library is very simple to utilize.

Using this library consists of three parts, which are Redis, job creator/distributors, and workers.
Redis is used to make queues for jobs. Both job creators and workers access Redis.
You have your job creation/distribution script, which puts jobs/tasks on Redis queue.
Your worker nodes take the jobs/tasks from Redis and perform their tasks.

Below you can see what one of the ways to set this up is:

You can have Redis and Job Creator and Worker run on the same machine or separate it all out depending on what your needs are.

Problem:
I needed to analyze bunch of binary files quickly and besides that, I just wanted to have infrastructure in place to let me run my python code distributed.

Installation:
I am using Proxmox to do all of this, in real life, you would probably multiple physical machines. I am using Ubuntu 14.04 containers.

We’ll start by installing redis-server and pip.
Run:
apt-get update
apt-get install python-pip redis-server
pip install rq rq-dashboard
To test installation, run:
redis-server –v
rq info
rq-dashboard --help
If everything installed correctly, you shouldn’t have gotten any errors.

Now you need to change redis configuration to make it bind to 0.0.0.0. (or whatever IP address you want Redis to bind to. FYI, Shodan does scan for Redis...)   Edit /etc/redis/redis.conf. Look for ‘bind 127.0.0.1’ and change it to ‘bind 0.0.0.0’
After changing the settings, run:
service redis-server restart

For your worker node and job creator, you would only install pip and rq, and other dependencies you need for your jobs or tasks. In the above example, I’m putting job distributor and redis on the same machine.

Simple example:
You will need to know IP of your redis server, in my case it’s 10.0.0.32.

On my worker node(s), I have the python script that I want to run.
File name: mod1.py
Code:
def func1(x,y):
   z = (x+y)*x
   return z

On my worker node, in the same directory as my mod1.py, I run the following command to start rqworker:
rqworker -u “redis://10.0.0.32”

On my job distribution container, I can run the following code to create and put jobs on Redis:
#Setup connection
from redis import Redis
from rq import Queue
q = Queue(connection=Redis(‘10.0.0.32’))
#Create a task
results = q.enqueue('mod1.func1',1,2)

Remember that mod1.py is our python file with func1 being the function we want to use from that file. We’re also running rqworker in the same directory as that file.
results.result variable will contain the returned value.


Another thing you installed is rq-dashboard. Rq-dashboard is pretty awesome looking webUI that lets you see your queues, jobs, and workers that are connected. If you have any jobs that fail, you can requeue them or cancel them from the webUI.

You can run it by typing rq-dashboard then visit http://IP_OF_SERVER:9181/ to view the dashboard.
Here are some screenshots:

Next time (assuming I have more time), I’ll try to cover using python-rq for PE file analysis.

Resources: