AWK RAWKS

This isn’t going to be an AWK how-to, but an AWK-‘why’. If you want a quick AWK tutorial check out Grymoires’ site.

System Administrators, DevOps or whatever you want to call them these days often need to parse large amounts of data in log files in order to extract relevant data. In fact, this is an often asked interview quiz question for many ‘nix sys admin jobs.

“Here’s a log file and a ‘nix shell. Write a script that tells me x, y and z.”

Often, smart, sys admins opt for PERL, and there is nothing wrong with that. However, did you know AWK was designed specifically for generating reports of this kind? The principles and techniques will be the same as in PERL, but AWK gives you a neat framework for generating them.

Here is a sample AWK script that parses an Apache log file, and spits out a list of IP addresses that have generated 1000 or more hits and how many hits they’ve generated sorted in descending order.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#!/usr/bin/awk -f
BEGIN {
    OFS="tt"                 #Set the output field separator
    print "IP Address", "Hits"    
}
/^[1-9]/ {                     #Parse lines that start with a number (IPv4).
    iphash[$1]++               #Increment IP in our associative array.
}
END {
    sort="sort -k2 -nr"        #The sort command we'll use. Parameters may
                               #vary depending on your flavor of 'nix.
                               #You may need to replace -k2 with +2.
 
    for (i in iphash) {
        if (iphash[i] >= 1000) print i, iphash[i] | sort 
                               #AWK's output buffer benefits us here
    }
    close (sort)               #close the sort pipe to flush output buffer
    print "TOTAL", NR          #print total number of records (hits).
}

Save it as ‘something.awk’, chmod it to executable and run `./something.awk /var/log/your.log`

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>