Chatter From a Charlatan...: DNS

Showing posts with label DNS. Show all posts

26 January 2017

BIND and ELK: Or, How I Learnt to Stop Worrying and Trust in Grok

As I'm pretty sure I've said every time I discuss DNS, I like BIND. What I don't like, though, is BIND logging, and that caused a problem for me at work today when I wanted to import BIND query logs into ELK. Let's take a look at why...and then how to solve it!

Don't worry, I'm going to come back to doing basic BIND and ELK installs in other posts but right now I have an itch that needs to get scratched.

Example BIND Logs

First, let's take a look at what BIND's query log looks like. To do that, I've installed Ubuntu 16.04 Server in a VM and then installed BIND via apt. I made sure to enable query logging and did a few lookups with 'dig' and 'host', just to get a few different types of queries. Be warned that there are some gotchas around logging if you're going to try this at home - I'll address these when I do an actual post about BIND on Ubuntu later.

So what do they look like?

If you're familiar with syslog, that is definitely NOT a syslog message! They list no facility nor program name. Even if you decide to send the messages to syslog, you're left with mingling them with other messages (not that there is anything wrong with that, I just don't think it's very clean - I like have a single file that is JUST query logs). That's okay, though, right? After all, you can have syslog-ng or rsyslog pick the contents of the file up and ship it over to an actual syslog server and it's perfectly searchable with grep, perl, python, whatever. But if you want to ingest it in ELK, and be able to do interesting things with it, it's not that simple...

What Logstash Sees

To give you an idea of how logstash sees things, I've setup a VERY SIMPLE ELK installation. By "very simple", I mean I've installed everything on one VM running Ubuntu 16.04 Desktop and copied the query log to that system so I can feed it directly into logstash and have it display how it parses each line to standard out (to screen). Right now it's not trying to store anything in elasticsearch and I'm not trying to search for anything in Kibana, I just want to see how each line is parsed:

Notice how everything meaningful from our log entry is encapsulated in the "message" field - and that's not very useful if you want to do something interesting like determine how many unique IPs on your network have searched for a specific domain or which domains <this IP> looked up over <this time period>. To do that, we have to have the logs in a format elasticsearch can understand - and for that, I'm going to use grok.

Constructing a Grok Expression

Grok expressions can be daunting, especially in the beginning (I am very intimidated by them). While searching for something else I came across a great tool for building grok expressions:

http://grokconstructor.appspot.com

I decided to use it to try to match a few lines from my query log:

If you want to try it yourself, here is what I pasted in:

25-Jan-2017 21:45:35.932 client 127.0.0.1#58483 (bbc.co.uk): query: bbc.co.uk IN A +E (127.0.0.1)
25-Jan-2017 21:46:05.665 client 127.0.0.1#51602 (amazon.co.uk): query: amazon.co.uk IN A +E (127.0.0.1)
25-Jan-2017 21:46:11.422 client 127.0.0.1#56018 (google.co.uk): query: google.co.uk IN A +E (127.0.0.1)
25-Jan-2017 21:46:35.125 client 127.0.0.1#52503 (youtube.co.uk): query: youtube.co.uk IN A +E (127.0.0.1)
25-Jan-2017 22:43:05.510 client 127.0.0.1#56259 (www.parliament.uk): query: www.parliament.uk IN A +E (127.0.0.1)

After a few iterations of choosing MONTHDAY, MONTH and YEAR, and then manually adding my on dashes in between, I had the timestamp for each line matched with the following expression:

%{MONTHDAY}-%{MONTH}-%{YEAR}

And in the tool, it looked like this:

With a bit more work, I was able to work out an expression that matched all of the lookups I'd performed:

%{MONTHDAY}-%{MONTH}-%{YEAR} %{TIME} client %{IP}#%{NUMBER} $%{HOSTNAME}$: query: %{HOSTNAME} IN %{WORD} \+%{WORD} $%{IP}$

That's a great first step! However, if I just use that in my logstash configuration, I still can't search by query or client IP because elasticsearch wont know about those. For that to work, I need to assign some names to the elements in my log entries (notice: this is done inside the grok filter):

%{MONTHDAY}-%{MONTH}-%{YEAR} %{TIME} client %{IP:clientIP}#%{NUMBER:port} $%{HOSTNAME}$: query: %{HOSTNAME:query} IN %{WORD:query_type} \+%{WORD} $%{IP}$

Test Again

First I updated my very basic logstash configuration to have a "filter" section with a grok expression to match my DNS log examples:

And then restart logstash (note I'm calling the logstash binary directly so I can use stdin/stdout rather than having it read a file or write to elasticsearch):

Now logstash knows about things like "clientIP", "port" and "query"!

Summary

Okay, I know, it needs a little more work. For example, I need to work on a mutate filter that replaces the @timestamp field (notice that's the time the log entry was received by logstash, not when it was created in query.log) with an ISO8601 version of the time the log entry was created. I also need to add a custom field, something like "received_at", to make sure I capture the time the item was received by logstash. Both of those are exercises for another time and for anyone who may be following at home.

I also know the current filter is incomplete because I only tested it with A record lookups. Would the same filter work for PTR lookups? DNSSEC validation? All of my tested queries were recursive where EDNS was in use (the +E flag), will it fail on recursive queries with no flags? (Spoiler: yes, it will, that should be an optional flag field). Please do not copy/paste the above filters and expect them to be "production-ready"!!

I am certain there are other ways to solve this problem and there are almost certainly cleaner filters out there. I'm just starting to scratch the surface of what can be done with filters and with ELK in general, but knowing there are tools and methods available to ingest custom log formats (and normalise those formats into something elasticsearch knows how to index and search!) goes a long way towards having the confidence to build out an ELK deployment that your organisation can use.

09 January 2013

BIND part 3: Full DLZ-backed domain

Not too long ago I gave a presentation at a tech conference regarding using DNS blacklisting/blackholes/sinkholes to identify and mitigate the effects of malware. Specifically, it was that presentation that sparked my last blog post on configuring BIND to use DLZs. After that presentation I received a phone call from an entity wanting to use DLZs for their entire domain, not just for their DNS blackhole. Since my DNS servers already have to do at least one database lookup for every DNS query, and since I already have instructions on how to build a DNS blackhole using DLZs, I thought it fitting to go ahead and extend that to the logical ending -- using DLZs for an entire domain.

Database Changes

Since I already have a working PostGreSQL, BIND and DLZ deployment in a virtual environment, I'm going to use that as my workspace. Using either "\d" from the psql CLI or looking at the output of a "pg_dump -s dnsbh" from the system CLI shows the following existing columns for the dns_records table:

id serial
zone text
host text
ttl integer
type text
data text

Where zone is the zone being queried, host indicates whether it's a single host or a wildcard for all subdomains, type indicates A/AAAA/MX/NS/etc and data indicates the destination IP address for that record.

To return "proper" responses for all queries related to a domain, there are some columns that need to be added:

mx_priority, the priority for MX records
contact, the responsible contact for the zone
update_serial, the serial number for the zone update
refresh, the amount of time slaves or secondary servers wait between update requests
retry, the amount of time slaves or secondary servers wait between update requests after a failed request
expiration, the amount of time before a record in the cache is considered stale
minimum, the minimum amount of time to keep the item in the cache
primary_ns, the DNS servers for the zone

To add these columns to the dns_records table, I'll issue the following at the psql CLI on the PostGreSQL master:

ALTER TABLE dns_records
ADD COLUMN mx_priority INTEGER,
ADD COLUMN contact CHARACTER VARYING (255),
ADD COLUMN update_serial INTEGER,
ADD COLUMN refresh INTEGER,
ADD COLUMN retry INTEGER,
ADD COLUMN expiration INTEGER,
ADD COLUMN minimum INTEGER,
ADD COLUMN primary_ns CHARACTER VARYING (255);

Supporting the DLZ driver functions

The PostGreSQL DLZ driver expects some value (even if that value is "NULL") to be returned for the MX priority in its lookup() function as the third value returned, between 'type' and 'data'. This function performs the query defined by the seconds "SELECT" statement in the BIND configuration file. Those two things combined means that I need to add the mx_priority column to the SELECT query. I will post the entire BIND configuration later so don't worry just yet about editing the file, just be cognisant that it will have to happen.

The PostGreSQL DLZ driver needs to be able to query for SOA and NS records and it does this via the authority() function. The query for this function should return values for ttl, type, mx_priority, data, contact, update_serial, refresh, retry, expiration and minimum. This will get added after the existing final "SELECT" statement in the BIND configuration. There is a caveat to this - the query used by the lookup() function *CAN* be written to return SOA and NS records; if this is the case then the query for the authority() function can be an "empty query" and written as "{}" (note there are no spaces between the braces). I am specifically NOT letting my lookup() function pull NS and SOA records so a query is necessary for the authority() function.

The next item added to the BIND configuration is a query that returns values for the DLZ driver's allnodes() function. It will return values identical to the authority() function but the query doesn't differentiate between NS/SOA and other records so it will be nearly identical to the query written for the authority() function.

Finally, it is possible to add a query that supports the allowzonexfr() function and returns the zone and IP addresses of clients allowed to perform zone transfers for that zone. Note that if you want to allow zone transfers then you *must* support both the authority() and allowzonexfr() functions. Since part of the point of database replication and backing BIND with a database is to allow instant access to updates, I will NOT add a query to support this functionality. It is recommended against by the original bind-dlz team, for what I believe to be very good reasons, and I believe the proper way to handle those updates is via database replication to all secondary DNS servers. Using this model effectively removes the "master/slave" relationship and allows any DNS server in the organisation to act as a primary DNS server without having to change anything in the BIND configuration.

Editing the BIND configuration

1) Adding mx_priority for the lookup() function

{
select
ttl,
type,
mx_priority,

case when lower(type)='txt' then '\"' || data || '\"' else data end

from
dns_records

where
zone = '$zone$' and

host = '$record$' and

not (type = 'SOA' or type = 'NS')}

}

2) Adding a query for the authority() function

{
select
ttl,
type,
data,
primary_ns,
contact,
update_serial,
refresh,
retry,
expiration,
minimum
from
dns_records
where
zone = '$zone$' and
(type = 'SOA' or type='NS')
}

3) Adding a query for the allnodes() function

{
select
ttl,
type,
host,
mx_priority,
data,
contact,
update_serial,
refresh,
retry,
expiration,
minimum
from
dns_records
where
zone = '$zone$'
}

Put all the queries together

Taking all of the above modifications into consideration, the new dlz section of my BIND configuration now looks like this (on my production machines the select statements aren't so separated, I just do it here for clarity):

dlz "postgres" {
database "postgres 4
{host=localhost port=5432 dbname=dnsbh user=dnsbh password='password_you_used'}
{
select
zone
from
dns_records
where
zone = '$zone$'
}
{
select
ttl,
type,
mx_priority,
case when lower(type)='txt' then '\"' || data || '\"' else data end
from
dns_records
where
zone = '$zone$' and
host = '$record$' and
not (type = 'SOA' or type = 'NS')
}
{
select
ttl,
type,
data,
primary_ns,
contact,
update_serial,
refresh,
retry,
expiration,
minimum
from
dns_records
where
zone = '$zone$' and
(type = 'SOA' or type='NS')
}
{
select
ttl,
type,
host,
mx_priority,
data,
contact,
update_serial,
refresh,
retry,
expiration,
minimum
from
dns_records
where
zone = '$zone$'
}";
};

Adding domain data to the database

I decided to test with a very simple domain. At the minimum I wanted a primary nameserver, a mail server, a random machine and something to justify a CNAME record so I could perform the following lookups:

SOA
NS
A
CNAME
MX
PTR

To do that, I used the following IP:name combinations:

10.10.10.103 -- ns1.demo.local
10.10.10.112 -- mail.demo.local

I then decided to use imap.demo.local as a CNAME to mail.demo.local.

First, I'll add the SOA records for the demo.local zone and for reverse lookups for the 10.10.10.0/24 network:

INSERT INTO dns_records (zone, host, ttl, type, data, mx_priority, contact, update_serial, refresh, retry, expiration, minimum, primary_ns) VALUES ('demo.local', '@', '86400', 'SOA', NULL, NULL, 'hostmaster.demo.local.', '2013010801', '3600', '1800', '604800', '86400', 'ns1.domain.local.');

INSERT INTO dns_records (zone, host, ttl, type, data, mx_priority, contact, update_serial, refresh, retry, expiration, minimum, primary_ns) VALUES ('10.10.10.in-addr.arpa', '@', '86400', 'SOA', NULL, NULL, 'hostmaster.demo.local.', '2013010801', '3600', '1800', '604800', '86400', 'ns1.domain.local.');

The NS records for my zone and IP range:

INSERT INTO dns_records (zone, host, ttl, type, data, mx_priority, contact, update_serial, refresh, retry, expiration, minimum, primary_ns) VALUES ('demo.local', '@', '86400', 'NS', 'ns1.demo.local.', NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL);

INSERT INTO dns_records (zone, host, ttl, type, data, mx_priority, contact, update_serial, refresh, retry, expiration, minimum, primary_ns) VALUES ('10.10.10.in-addr.arpa', '@', '86400', 'NS', 'ns1.demo.local.', NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL);

The MX record with a priority of 10 for mail.demo.local:

INSERT INTO dns_records (zone, host, ttl, type, data, mx_priority, contact, update_serial, refresh, retry, expiration, minimum, primary_ns) VALUES ('demo.local', '@', '300', 'MX', 'mail.demo.local.', '10', NULL, NULL, '3600', '1800', '604800', '86400', NULL);

The A records for ns1.demo.local (10.10.10.103) and mail.demo.local (10.10.10.112):

INSERT INTO dns_records (zone, host, ttl, type, data, mx_priority, contact, update_serial, refresh, retry, expiration, minimum, primary_ns) VALUES ('demo.local', 'ns1', '86400', 'A', '10.10.10.103', NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL);

INSERT INTO dns_records (zone, host, ttl, type, data, mx_priority, contact, update_serial, refresh, retry, expiration, minimum, primary_ns) VALUES ('demo.local', 'mail', '86400', 'A', '10.10.10.112', NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL);

The PTR records so reverse lookups work:

INSERT INTO dns_records (zone, host, ttl, type, data, mx_priority, contact, update_serial, refresh, retry, expiration, minimum, primary_ns) VALUES ('10.10.10.in-addr.arpa', '103', '86400', 'PTR', 'ns1.demo.local.', NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL);

INSERT INTO dns_records (zone, host, ttl, type, data, mx_priority, contact, update_serial, refresh, retry, expiration, minimum, primary_ns) VALUES ('10.10.10.in-addr.arpa', '112', '86400', 'PTR', 'mail.demo.local.', NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL);

Finally, the CNAME to point imap.demo.local to mail.demo.local:

INSERT INTO dns_records (zone, host, ttl, type, data, mx_priority, contact, update_serial, refresh, retry, expiration, minimum, primary_ns) VALUES ('demo.local', 'imap', '86400', 'CNAME', 'mail.demo.local.', NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL);

At this point the following lookups should work when performed against the DNS server:

dig soa demo.local
dig ns ns1.demo.local
dig mx demo.local
dig mail.demo.local
dig imap.demo.local

From here it's a trivial matter to add as many A records or CNAMEs and corresponding PTR records as necessary. My authoritative DNS servers handle over 60k IPs and I make manual changes every day so if I were backing them with DLZ I would develop a web-based management interface but for a small network (a /24 or smaller), with a single DNS administrator, manually interacting with the database really isn't too terribly prohibitive.

10 December 2012

BIND part 2: DNS blackhole via DLZs

In my last post I detailed installing BIND with DLZ support -- in this post I'll actually USE that option.

Step One: Database setup

PostgreSQL, by default, now creates databases in UTF-8 encoding. The DLZ driver in BIND uses ASCII/LATIN1, which introduces some interesting peculiarities. Specifically, querying for 'zone="foo.com"' may not work. That's okay, you can still create LATIN1 databases using the template0 template. Since PostgreSQL replication is already configured, everything but the BIND configuration is done on the database master. First, add the new database user:

createuser -e -E -P dnsbh

Since this user won't have any extra permissions, just answer 'n' to the permissions questions.

Now add the new database, as latin1, with dnsbh as the owner:

createdb dnsbh -E 'latin1' -T template0 -O dnsbh

The database schema is pretty flexible; there are no required column names or types, as long as queries return the correct type of data. I like to use a schema that reflects the type of data the column holds so I'll use the following create statement:

create table dns_records (
id serial,
zone text,
host text,
ttl integer,
type text,
data text
);
create index zone_idx on dns_records(zone);
create index host_idx on dns_records(host);
create index type_idx on dns_records(type);
alter table dns_records owner to dnsbh;

This can be modified, of course. The data, type, host and ttl fields can have size restrictions put in place and you can drop the id field altogether. The bind-dlz sourceforge page lists several more columns but those are only necessary for a full DLZ deployment, where the DNS server is authoritative for a domain, not for a purely recursive caching server.

Step Two: BIND setup

If you read the bind-dlz configuration page you'll find a set of database queries that get inserted into named.conf. You can modify these if you like but it's much easier to use what's provided. There are, at a minimum, four lines that need to be included. The first indicates the database type and the number of concurrent connections to keep open (4). The next is the database connection string (host, username, password). The third returns whether the database knows about a given zone and the fourth returns the actual IP for any lookup that isn't for types MX or SOA. Add the following to /etc/namedb/named.conf and restart BIND (I split the fourth "line" into multiple lines for readability - you can put it all on one line or leave it as below):

dlz "postgres" {
database "postgres 4
{host=localhost port=5432 dbname=dnsbh user=dnsbh password='password_you_used'}
{select zone from dns_records where zone = '$zone$'}
{select ttl, type,
case when lower(type)='txt' then '\"' || data || '\"' else data end
from dns_records
where zone = '$zone$' and
host = '$record$' and
not (type = 'SOA' or type = 'NS')}";
}

At this point BIND should restart and it should continue to serve DNS as normal -- meaning it's time to test the blackhole. A quick and easy test is to dig for sf.net:

dig @127.0.0.1 sf.net

Then add sf.net to the dns_records table on the PostgreSQL master:

insert into dns_records(zone, type, host, data, ttl) values ('sf.net', 'A', '*', '127.0.0.1', '3600');

Dig for sf.net again:

dig @127.0.0.1 sf.net

Note the 3600 second TTL. If a downstream DNS server were querying our blackhole then the value of 127.0.0.1 would get cached for an hour. To remove the sf.net entry from the blackhole, go back to the PostgreSQL master:

delete from dns_records where zone = 'sf.net';

www.malwaredomains.com and projects like ZeusTracker keep a list of domains seen in the wild that are used to provide malware. The malwaredomains.com site has a great tutorial on how to do DNS blackholes via BIND configuration files. Since they provide a list of domains it's pretty trivial to script something to pull those domains out and jam them into the blackhole table.

BIND part 1: Recursive resolver

One of the most commonly used services on the Internet is DNS. We use it every time we type something into Google, check our email or stream a video from YouTube. It is THE address book for nearly everything we do on the Internet.

It is also used nefariously. Malware authors will use DNS instead of static IPs so that they can constantly switch malware servers, making IP-based firewall rules useless. Groups like malwaredomains.com track the domains used by malware and provide a readily-accessible list so that DNS operators can blackhole those lookups. This works great but any changes mean a BIND restart/reload and adding them manually can be a real issue if you have multiple DNS servers and you're not managing them via something like fabric/chef/puppet/etc.

I have taken it one step further. I use readily-available information from groups like ZeusTracker and Malware Domains, then I use the DLZ functionality in BIND, so that I only have to enter the domain into a database table and it immediately becomes active in BIND -- since BIND checks the database before checking its local zones and its cache, no restart or reload is necessary.

In THIS post, I'm detailing the first step: just setting up a recursive resolver for general use with some groundwork laid for the second step, using DLZs for blackholing lookups.

I'm going to use FBSD_8_3_i386_102 as the database master (see the previous post), FBSD_8_3_i386_103 as a recursive resolver with DLZ support so that I can blackhole domains and FBSD_8_3_i386_104 as a test client. Because of the way networking works in VirtualBox, some changes need to be made to the virtual machine that is acting as the BIND server so that it can access the Internet but still serve DNS to the internal network. Under Settings -> Network, I'm going to click "Adapter 2" and add a NAT interface. This lets me use Adapter 1 to communicate with the other virtual machines in the psql_test internal network and Adapter 2 to communicate with the world.

Now fire up the three FreeBSD virtual machines. On FBSD_8_3_i386_103, edit /etc/rc.conf. Even though VirtualBox says "Adapter 2" is the NAT interface, that adapter is really the new em0 and the old em0 (with a static IP) is now em1. The two lines in /etc/rc.conf should look like:

ifconfig_em0="dhcp"
ifconfig_em1="10.10.10.103 netmask 255.255.255.0"

The netif script doesn't restart dhclient so reboot and make sure networking works.

BIND is included by FreeBSD in the default install but it doesn't have DLZ support. That means installing BIND from ports:

sudo portmaster -t -g dns/bind99

There are a couple of options to toggle at the configuration. I generally disable IPv6, enable DLZ, enable DLZ_POSTGRESQL and then have the ports BIND replace the system BIND:

There are some cons to replacing the system BIND with BIND from ports, namely that a freebsd-update affecting BIND will require a re-installation of the custom-built BIND.

The default BIND configuration is almost perfect for a general caching server so let's start with that. In /etc/namedb/named.conf, find the line that looks like:

listen-on { 127.0.0.1; };

Comment out that line (lines starting with // are treated as comments). This tells BIND to respond to DNS queries on any IP. I wouldn't use that configuration on a production resolver but it works fine for this scenario.

If your ISP requires you to use their DNS servers then you can set your BIND instance to "forward only". In that configuration BIND will still serve from its cache but it doesn't try to go out to the root and TLD servers - it just queries your ISPs DNS servers. To enable this configuration in /etc/namedb/named.conf, locate the following section:

/*
forwarders {
127.0.0.1;
}
*/

Remove the /* and */ (those are block commenting delimiters) and change the IP to that of your ISPs DNS server. Then find and uncomment the line that looks like:

// forward only;

To enable and start BIND, edit /etc/rc.conf and add the line:

named_enable="YES"

Then run

sudo /etc/rc.d/named start

An easy test of whether it's answering requests is to run

dig @10.10.10.103 google.com

Notice that it took a second or so to return anything. That was BIND loading the data into its cache. If you run that command again it should answer almost immediately. An authoritative resolver requires considerably more work but for a basic recursive resolver, that just serves DNS from its cache or makes look-ups on behalf on clients, the setup is fairly trivial.

Chatter From a Charlatan...