Thursday, 26 January 2017

BIND and ELK: Or, How I Learnt to Stop Worrying and Trust in Grok

As I'm pretty sure I've said every time I discuss DNS, I like BIND. What I don't like, though, is BIND logging, and that caused a problem for me at work today when I wanted to import BIND query logs into ELK. Let's take a look at why...and then how to solve it!

Don't worry, I'm going to come back to doing basic BIND and ELK installs in other posts but right now I have an itch that needs to get scratched.

Example BIND Logs

First, let's take a look at what BIND's query log looks like. To do that, I've installed Ubuntu 16.04 Server in a VM and then installed BIND via apt. I made sure to enable query logging and did a few lookups with 'dig' and 'host', just to get a few different types of queries. Be warned that there are some gotchas around logging if you're going to try this at home - I'll address these when I do an actual post about BIND on Ubuntu later.

So what do they look like?

If you're familiar with syslog, that is definitely NOT a syslog message! They list no facility nor program name. Even if you decide to send the messages to syslog, you're left with mingling them with other messages (not that there is anything wrong with that, I just don't think it's very clean - I like have a single file that is JUST query logs). That's okay, though, right? After all, you can have syslog-ng or rsyslog pick the contents of the file up and ship it over to an actual syslog server and it's perfectly searchable with grep, perl, python, whatever. But if you want to ingest it in ELK, and be able to do interesting things with it, it's not that simple...

What Logstash Sees

To give you an idea of how logstash sees things, I've setup a VERY SIMPLE ELK installation. By "very simple", I mean I've installed everything on one VM running Ubuntu 16.04 Desktop and copied the query log to that system so I can feed it directly into logstash and have it display how it parses each line to standard out (to screen). Right now it's not trying to store anything in elasticsearch and I'm not trying to search for anything in Kibana, I just want to see how each line is parsed:

Notice how everything meaningful from our log entry is encapsulated in the "message" field - and that's not very useful if you want to do something interesting like determine how many unique IPs on your network have searched for a specific domain or which domains <this IP> looked up over <this time period>. To do that, we have to have the logs in a format elasticsearch can understand - and for that, I'm going to use grok.

Constructing a Grok Expression

Grok expressions can be daunting, especially in the beginning (I am very intimidated by them). While searching for something else I came across a great tool for building grok expressions:

I decided to use it to try to match a few lines from my query log:

If you want to try it yourself, here is what I pasted in:

25-Jan-2017 21:45:35.932 client ( query: IN A +E (
25-Jan-2017 21:46:05.665 client ( query: IN A +E (
25-Jan-2017 21:46:11.422 client ( query: IN A +E (
25-Jan-2017 21:46:35.125 client ( query: IN A +E (
25-Jan-2017 22:43:05.510 client ( query: IN A +E (

After a few iterations of choosing MONTHDAY, MONTH and YEAR, and then manually adding my on dashes in between, I had the timestamp for each line matched with the following expression:


And in the tool, it looked like this:

With a bit more work, I was able to work out an expression that matched all of the lookups I'd performed:

%{MONTHDAY}-%{MONTH}-%{YEAR} %{TIME} client %{IP}#%{NUMBER} \(%{HOSTNAME}\): query: %{HOSTNAME} IN %{WORD} \+%{WORD} \(%{IP}\)

That's a great first step! However, if I just use that in my logstash configuration, I still can't search by query or client IP because elasticsearch wont know about those. For that to work, I need to assign some names to the elements in my log entries (notice: this is done inside the grok filter):

%{MONTHDAY}-%{MONTH}-%{YEAR} %{TIME} client %{IP:clientIP}#%{NUMBER:port} \(%{HOSTNAME}\): query: %{HOSTNAME:query} IN %{WORD:query_type} \+%{WORD} \(%{IP}\)

Test Again

First I updated my very basic logstash configuration to have a "filter" section with a grok expression to match my DNS log examples:

And then restart logstash (note I'm calling the logstash binary directly so I can use stdin/stdout rather than having it read a file or write to elasticsearch):

Now logstash knows about things like "clientIP", "port" and "query"!


Okay, I know, it needs a little more work. For example, I need to work on a mutate filter that replaces the @timestamp field (notice that's the time the log entry was received by logstash, not when it was created in query.log) with an ISO8601 version of the time the log entry was created. I also need to add a custom field, something like "received_at", to make sure I capture the time the item was received by logstash. Both of those are exercises for another time and for anyone who may be following at home.

I also know the current filter is incomplete because I only tested it with A record lookups. Would the same filter work for PTR lookups? DNSSEC validation? All of my tested queries were recursive where EDNS was in use (the +E flag), will it fail on recursive queries with no flags? (Spoiler: yes, it will, that should be an optional flag field). Please do not copy/paste the above filters and expect them to be "production-ready"!!

I am certain there are other ways to solve this problem and there are almost certainly cleaner filters out there. I'm just starting to scratch the surface of what can be done with filters and with ELK in general, but knowing there are tools and methods available to ingest custom log formats (and normalise those formats into something elasticsearch knows how to index and search!) goes a long way towards having the confidence to build out an ELK deployment that your organisation can use.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.

Puppet Part Three: In Which I Write My First Module

In my previous post I wrote a manifest for stark that removed specific groups/users and added specific groups/users. This allowed me to do s...