Sunday, 9 July 2017

The Three-Eyed Raven: Threat Intelligence With CIF

One of the big buzzwords in InfoSec right now (and it has been for a few years) is "threat intelligence" (TI). It goes right along with "indicators of compromise", or IOC, and many use the terms interchangeably. I admit, I'm guilty of it from time to time. Ultimately, though, they have very different meanings -  so for the context of this post, I want to go ahead and clarify *my interpretation* of what those things mean.

An indicator of compromise/IOC is a discrete, observable unit. It could be an IP address, an ASN, a file hash, a registry key, a file name or any of several other types that are observed as part of monitoring or performing forensics on a compromised (or suspected compromised) system.

Threat intelligence/TI should be comprised of an IOC *and additional contextual information*, such as when it was observed, under what conditions, etc. If someone tries to sell you threat intelligence that is just a feed of IPs and domain names, they aren't selling you threat intelligence - they're selling you indicators.

With that said, let's consider a scenario. You work for a company with small sites all over the country. Each site has their own SecOps person who also happens to be THE do-it-all IT person - they run the handful of servers, networking equipment and desktops/tablets/other endpoints at that site. There is no centralised logging infrastructure or SIEM yet. You work at site alpha and see a port scan against your external IP addresses. Then you see SSH brute-forcing attempts against your servers. Do you share this information with your colleagues at the other sites? If so, how? Do you send an email or drop it into a channel on your private IRC server?

Enter the Collective Intelligence Framework, or CIF.

A Quick Overview -- And the Installers!

First, you need to know that there are two popular versions of CIF, version 2 and version 3.

CIFv2 is *the* way to get started. It has moderate hardware requirements for a business application:

8+ cores
250GB+ of disk, depending on how long you want to keep data - I've installed on systems with just 50GB

When you install it, you have a CIF instance backed by ElasticSearch and a command-line query tool. It will update nightly with Open Source Intelligence (OSINT) from multiple sources. The installer can be found here:

CIFv3 is "in development" and seeing updates regularly. Like I said, CIFv2 is the way to get started - it's more mature, it has a larger user base, it's easier to get going and it has more forgiving requirements for those looking to get started:

2+ cores
10GB+ disk - against, depending on how long you want to store data

If you choose to go the CIFv3 route, you should have a CIF instance backed by sqlite3 and a command-line query tool. It will also update regularly with OSINT from multiple sources. Its installer can be found here:

So, why am I even writing this post? Well...I don't really like sqlite3 for how I want to use CIF and you aren't forced to use sqlite3, but getting it backed by ElasticSearch isn't really documented - now that I have it working, why not share that information with the world?

First, ElasticSearch

As with everything I do that's Linux-related, I'm going to start with a "plain" Ubuntu Server 16.04 LTS install. As of the time of writing, that's 16.04.2 LTS.

I'm going to give it four cores, eight gigabytes of RAM, fifty gigabytes of disk and I'm not adding any additional packages or tools during installation. Please note that for this purpose, fifty gigs of disk is WAY more than I'll use. Twenty would be way more than I'll use. I'm only giving it fifty because that's how I'm setting up my new templates.

Because I want to back this with ElasticSearch, I'll need to install it plus its dependency: Java. You can follow the same steps I used in my previous posts on installing ElasticSearch,, or you can follow the steps below. Below has one huge difference - I use the OpenJDK headless JRE that is included with apt instead of using the Java 8 installer from the webupd8team PPA.

First, some prep steps so apt knows about, and trusts, the Elastic repository:

wget -qO - | sudo apt-key add -
echo "deb stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-5.x.list

Now update the apt application list and install ElasticSearch:

sudo apt update
sudo apt install elasticsearch

Now tell systemd to start ElasticSearch at boot and start ElasticSearch:

systemctl enable elasticsearch
systemctl restart elasticsearch

At this point you should be able to run curl against ElasticSearch and ask for an index list (there should be none):


If ElasticSearch is running, you should immediately get another command prompt with no output from the above command.

ElasticSearch Defaults

By default, ElasticSearch will create all indices with five shards and one replica. This works great if you have multiple ElasticSearch nodes but I only have one node. To do a little ElasticSearch housekeeping, I'm going to apply a basic template that changes the number of replicas for my indices to 0.

This will apply to both the "tokens" index used for authorisation in CIF and the "indicators-<month>" index used for actual threat data. I am ONLY setting the new default to zero replicas because I have no intention of using any other nodes with ElasticSearch. If I were going to possibly add more nodes I would skip straight to "Now, CIF".

To make this change, I'll create a file that has the settings I want for all of my indices; in this case, I'm going to name it "basic-template" and I want every index to have 0 replicas.

  "template" : "*",
  "settings" : {
    "index" : {
      "number_of_replicas" : "0"

Then I'll use curl to save that template in ElasticSearch. Because I've used "template" : "*", this template will get applied to every index that's created. For a single-node setup this is fine; if I had a multi-node cluster backing my CIF instance, I might change the number of replicas so that I could speed up searching or to help with fault-tolerance. The curl command to import this template as "zero_replicas" would be:

curl -XPUT -d "$(cat basic-template)"

Now, CIF

First, grab the latest release of the deployment kit. As of writing, that is 3.0.0a5. The releases are listed here:

You can download 3.0.0a5 directly via wget:


Unzip the tarball with tar:

tar zxf 3.0.0a5.tar.gz

That should give you a directory called bearded-avenger-deploymentkit-3.0.0a4.  If you do a directory listing with ls, you'll see several subdirectories and configuration files. If you wanted to do a "generic" install of CIFv3, you could run "sudo bash" and it would do its thing, leaving you with an sqlite3-backed CIF. That's not what we want, so let's make some changes!

The important file to edit is global_vars.yml. I've tried several combinations of the options to add and, while it doesn't matter what order the following are added, they do all need to be there to use ElasticSearch with CIFv3:

For copy and paste, the added lines are:

cif_store_store: elasticsearch
ANSIBLE_CIF_ES_NODES: 'localhost:9200'
CIF_ANSIBLE_ES_NODES: 'localhost:9200'

Now I can use the installer to get things rolling (note: on my Internet connection at home this took almost thirty minutes due to installing dependencies):

sudo bash

Remember, this is CIFv3 and it's *in development*. The install may break but it's unlikely. With the tiniest bit of luck, you should end up with a command prompt and zero failures. You'll know if it failed, all of the text will be red and the last line will tell you how many tests it failed!

On Tokens and Hunters

The very last thing that the installer currently does is add "tokens" for specific accounts. "Tokens" are a hash used for authentication. By default, CIF will create a new user, "cif", with a home directory of "/home/cif", and in that directory is a token for the cif user, stored in "/home/cif/.cif.yml". Be careful with that token, it allows for *full administrative access* to your CIF installation. If it's just you and you are going to do everything with that account, great, but if you think you're going to have other users, feed in a lot of data, share data with other entities, etc., please take some time to read up on token creation:

cif-tokens --help

If you just run "cif-tokens" without any options, it will print out all of the tokens it knows about:

"hunters" are the real workhorses that get data into CIF and they are disabled by default because they can bring a system to a crawl if there are too many of them. With no value set, I have four on a fresh boot. To add more (and you want to), edit /etc/cif.env or /etc/default/cif and set CIF_HUNTER_THREADS = <x>. I set it to two and I have six cif-router threads running.

If I had a server with eight cores, and a separate ElasticSearch cluster, I may set that to four or higher. Just know the more you have, the more resources they're going to use!!

At this point I like to do a restart, just to make sure ElasticSearch and all of my cif processes will start up with a reboot.

A Simple Query

When the CIF server processes kick off, they set a pseudo-random timer for up to five minutes. At the end of that timer everything will start to work together to get data into your indicators-<month> index. I say this because if you query cif in the first few minutes after installing it, you're not going to get anything. At all. Zilch. I don't want you to think you have a broken install just because the hunters haven't had a chance to populate your back-end!!

Give it a few minutes. I'd give it fifteen or twenty. Seriously, just reboot the box and step away for a while. Go fix a cup of tea, have a glass of water, play with the dog, just make sure you let it do its thing for a bit. Sometimes I'll use curl to check on the elasticsearch index just to see how quickly it's growing:

When you come back to it you can start trying things out. Remember that by default the only user who can query CIF is your cif user - I usually just change user to cif with:

sudo su - cif

Then you can try a simple query with:

cif -q

Notice you don't get anything back - just an empty table. That's because doesn't show up in CIF by default. However, if you run the command again, you'll get something very different! For example, if I search now (I've done it a few times), I get:

Why is that? Well, by querying CIF, you have told it that you have something interesting. Notice how when I use "cif -q", it adds an entry -- that's because it's now a potential indicator. It also sets the tag to "search", meaning it was something I searched for manually, not that I received as part of a feed.

Notice all of the fields in my result for a simple query for "":

"tlp": Traffic Light Protocol. This is the sharing restriction - just by querying, it gets set to "amber", typically meaning "share only if there's a need to know".
"lasttime": the last time, *in UTC*, that this indicator was seen
"reporttime": the time, *in UTC*, when this indicator was introduced to CIF
"count": the number of times CIF has seen this indicator
"itype": the indicator type; notice it is "fqdn", or "fully qualified domain name"
"indicator": the thing that was seen; in this scenario, ""
"cc": the country code for the indicator; you'll see this a lot with IP addresses
"asn": the autonomous system number, again common with IPs
"asn_desc": usually the company responsible for that ASN
"confidence": your confidence level that this is something bad (or good, depending on how you use CIF)
"description": this is a great field for putting web links that have additional information
"tags": any additional "one word" tags you want to apply to the indicator
"rdata": this is "related" data; for a domain it may be an IP address - you can specify this value
"provider": the entity who provided this indicator; in this scenario it's the cif user using the "admin" token

See what I mean by the difference between having just an indicator (just an IP or domain) and turning it into intelligence by adding context and additional information? You may have "" as an indicator but the description may have a link to an internal portal page that details everything you know about that domain. If "lasttime" is three months ago, and count is "2", why is this thing showing up again after three months of nothing?

Wrapping It All Up

Companies pay tens of thousands, or hundreds of thousands, of pounds (or dollars, or whichever currency you prefer) for "threat intel" but what does that mean? What are they getting - are they getting indicators or are they getting intelligence? How do they get it - is it via a STIX/TAXII download, a custom client?

More importantly, _can they use it effectively_? It does me no good to pay £10,000 per year for a "threat intelligence" feed if I don't have processes and procedures in place to do large imports of denied IPs/domains/etc, or to remove them if something gets in that I don't actually want to block. Moreover, I can't show that the intel I'm receiving has value if I don't have metrics around how useful the intel is - for example, how many incoming connection attempts were blocked by known port-scanners or brute-force endpoints?

Yes, "threat intelligence" can be useful, but make sure your program is of sufficient maturity to benefit from it and that you're actually getting your money's worth.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.

So...I thought I'd Jump Into Puppet (Puppet Part One)

A Quick Note A lot of folks like to write about things they know very well and about which they can answer questions. Generally speaking ...