In this previous article, it was mentioned at various points that collection should be automated. After a few emails and slack conversation about it, sharing some approaches to automation seemed in order. Below covers several useful ways to leverage Python to collect from the starting credential sources mentioned in the previous article.
Python works on any OS platform, but the suggestions and examples below leverage Linux. Please keep that in mind. Virtualenv is a useful tool to create isolated Python environments. Virtualenv builds environments that have their own installation directories and don’t share libraries. That means easily installing Python 2.7 and Python 3.x on one system without running into conflicts, and maintaining different versions of modules for different projects without running into collisions. It is highly suggested to use with the following projects.
The crontab command, found in Unix and Unix-like operating systems, is used to schedule commands to be executed periodically. Given the examples use Ubuntu, here’s the official breakdown of Cron for tasks from Ubuntu. The Cronhowto explains Cron and how it works, so it is skipped here. Use Cron to put the below scripts on a schedule, so they can run unattended.
Alienvault has a GitHub repository that covers their SDK at https://github.com/AlienVault-OTX/OTX-Python-SDK. To leverage it properly, sign up for an account and get an API key. It is required to use the code in the repository properly. Install and setup according to the directions in the readme. They provide an example python file, which demonstrates how to search pulses for a string at https://github.com/AlienVault-OTX/OTX-Python-SDK/blob/master/examples/cli_example.py. It can also be leverage to search by domain, hash value and other threat data.
IBM XForce OTX
XForce is a slightly more tricky to leverage. Tags can be searched via the GUI, but API searching isn’t well-supported. The GitHub project at https://github.com/johestephan/CTI-Toolbox/blob/master/xfexchange.py is one of the more versatile Python ones to draw information from the XForce OTX. Even so, it is not at a desirable level. Queries can be automated, and should be, but true automated searching of collections and XForce data remains pretty limited for credential research.
Numerous well-written Twitter projects exist on Github. If you don’t have one that meets your fancy, consider the project at https://github.com/shantnu/TwitterAnalyser. The full background of the project is here: http://pythonforengineers.com/build-a-twitter-analytics-app-using-python/. The tutorial is well-written and easy to organize and automate to find the credential information of desire on Twitter.
Given the volume of data that flows through this site, two different approaches are recommended. The first is a simple crawler. The code is here: https://github.com/kahunalu/pwnbin/blob/master/pwnbin.py. The setup and documentation is very straightforward. The more powerful, but equally complicated one is here: https://github.com/kevthehermit/PasteHunter. PasteHunter lets you leverage YARA rules to search not only Pastebin, but also dumps and GitHubGist as well. You will need a Pro account on Pastebin and an Oath token for Github to avoid the free rate limit.
Reading the comments on Reddit can drain hours of your life in a blink. Avoid the trap by automating what you need to find. https://github.com/shantnu/RedditBot is a Github project with code to monitor posts and comments on Reddit. http://pythonforengineers.com/build-a-reddit-bot-part-1/ contains the walkthrough and setup of the Reddit Bot.
To help derive a list of data breaches, you can leverage the Github repo here: https://github.com/0xiNach/Web-Scraping-Machine-Learning. It outlines a way to scrap Hackmageddon to get that information. Also, it has some very useful visualizations, which can be leveraged for presentations of the data.
In the previous article, a link to https://blog.shodan.io/tracking-hacked-websites-2/ was provided as an example to find defaced websites. The GitHub project at https://github.com/HatBashBR/ShodanHat/blob/master/shodanhat.py can be leveraged automate a similar search, if you don’t want to create a cron using the shodan code shown in the video from the shodan blog.
That concludes a brief overview of ways to get started using Python to help automate collection from these resources. Remember, this is only a starting point. Refine, add and remove resource to fit your need.