Project Monitor – Listening Internet Whispers

Monitor is a BOT programmed to spider web pages and collect desired content. It was originally developed to find logins, email addresses,  password hashes, shadow files, SQL dumps and generally, data leaked by users and developers, or disclosed by Hacktivist in the Internet. Monitor implements a serie of filters and a mechanisms to classify and categorise the parsed content, which is subsequently stored into a database for later access. This tool has been used to parse Pastebin, Anonpaste and Pastie web sites and the following is a demonstration of this utility.

Monitor is written in Python and the following picture illustrates its usage information:


Using the -u switch, Monitor will connect to its database and retrieve the content of the file specified by the URL parameter. While it is saying to provide a URL, it is not actually referring to any URL at all. I need to modify Monitor source to replace here URL with code or ID, since the parameter accepted it’s a simple alphanumeric code of 8 characters used to identify the resource.

The -a switch allows you to add a new filter rule to the existing ones. A filter – this is how they are called in monitor – is composed by a rule, a rule name, a rule type (or category) and points. The rule is substantially a Regex string that will be used during the analysis of the data. If, by parsing the content, the Regex rule is satisfied, the content is saved into the Monitor database. The rule type is used to categorised the data for both storage and retrieval purposes. The rule points are used to define a score at the end of the analysis of the single resource. The score is mainly based on the occurrences of the same rule in the data file. An example of such mechanism is shown later on in this blog post.

You think you can write Regex strings well until you implement them in a program. If the program slow down drastically then it is time to re-study the Regex implementation. With Monitor, I learn a few tricks about Regex, but I am not going to write Regex stuff here, right now.

In order to avoid occurrences of the same content to be downloaded and processed, each data is sign using SHA1SUM hashing algorithm. This means that a hash string is first created with the new data content and compared with the existing ones. If no such signature exists in the database then the new file is processed.

The -s switch allows you to search within Monitor database for content that corresponds to a particular type and with a minimal desired score. Available types are IP address, hashes, memory dumps, email addresses, credit cards etc. etc. etc. Common names and notorious company names are also in the list. However, anything can be added to the filters.

The -r option will instead execute Monitor which will perform both research of new content and analysis. This is usually executed once, then Monitor run on the background. However it is useful to start and stop the program.

The following is an example of Monitor while analysing the data:


In the above screen-shot, you can read the full date and time of when the data was downloaded from the Internet, the code (or ID), the score, the number of occurrences and the type of filter for that specific instance. As you can read, information such as IP addresses, memory dump and e-mails were identified.

The below is another screen-shot of Monitor while running. This time we have 101 hashes and 101 URL and 2 IP addresses. From these information we can for example understand that the hashes are not related to passwords but rather used in URLs. What else? credit card numbers and IP addresses:


One more and last screen-shot of Monitor to show the -u switch. Specifically, the one related to the credit card found above:


The file contains the full credit card number along with the full name of the valuable card holders.

In conclusion, Monitor is cool and It can be used to scan for desired content in any website in the Internet. It was originally designed to parse the aforementioned websites but it can be used and run against any web page.

If you like data mining stay tuned as I am working on an interesting project right now which I will be publishing soon. However feel free to contact me if you wish to know more.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s