Tuesday, 11 April 2017

Generating Sample Log Files Part Two: Do It In Python


I didn't really like the run time of my bash script, and I really want to dig some more into python (meaning expect me to do it wrong *a lot*), so I re-wrote it in python.

I also changed a few things...

Some Additions


The script I started with only writes something that looks like a proxy log - but sometimes you want more than just one type of log. For the python version, I've added a "dns" type that make something that looks a bit like passive DNS logs and I've added a stub for DHCP-style logs.

I've also added an option to let me decide, at invocation, how many days I want logs for. There is no error checking - creating logs for 3000 days may fill your hard drive with log files. Don't do that. Use small numbers like 1 and 2 until you see how much disk they use.

The one big thing I want to work on is the "consistent" option. Ideally I'd like to have a DHCP entry for <foo> MAC address that creates <foo_two> DNS log by visiting a given site that generates <foo_three> proxy log. Right now everything is pseudo-random - that's great for analysing individual log types but rubbish for creating a large volume of cohesive logs of multiple types.

As with everything, it's a work in progress...

Change in I/O


The existing script just redirected output to a file -- if it existed then it deleted it and started a new one, so you were always guaranteed "fresh" data. While I have a need for a file that changes every few seconds, I also want to be able to create a big log file in a short amount of time. To that end, I've changed how the python script writes.

Instead of opening the file, appending one line and then closing it, the python script is a bit more efficient. It batches the writes in groups of 10,000 - so it opens the file once, collects ten thousand lines of log in RAM, writes out those ten thousand, clears the variable, collects another ten thousand, etc., then closes the log when it finishes.

Did It Help?


The bash script took approximately three hours to run. That's not an inconsequential amount of time to have to wait just to have a month's worth of log file to pump into Splunk, ELK, GrayLog2 or whatever.

The python script, by comparison, can write 30 days of logs in 45 seconds (actual time measured using "time ./createLogs.py --days 30 --proxy" is 42.9 seconds!). That's a HUGE difference in wait-time! I had time to cook *and eat* supper while the first script ran. I can't put the kettle on and be back before the python script finishes!

Show Me The Code!


Like I said, I am a complete newbie in python. I've hacked a few scripts together to do some basic things I've needed and this certainly falls in the category of "hacked together"! This is my first foray into time and arrays in python so some things are done in weird ways to help me grok how things work. All of that said, you're more than welcome to it. You can find it here:

https://github.com/kevinwilcox/samplelogs

Notice that has a link to both scripts, in bash and python, and while they'll both *run* at the time of code commit, I make no promises that they'll do anything they're supposed to do. User beware, I'm not responsible if they bork something on your system, they're freely offered without support under the BSD 3-clause licence, etc.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.