App-Unliner
view release on metacpan or search on metacpan
$ grep "GET /report.cgi" access.log | awk '{print $1}'
Next we add the standard "sort | uniq -c | sort -rn" tallying pipeline:
$ grep "GET /report.cgi" access.log | awk '{print $1}' | sort | uniq -c | sort -rn
Oops, the important bit scrolled off the screen. Let's add a "head"
process to limit the output:
$ grep "GET /report.cgi" access.log | awk '{print $1}' | sort | uniq -c | sort -rn | head -n 5
And we finally get our nice report:
3271039 10.3.0.29
912 10.9.2.7
897 10.9.2.1
292 10.9.2.3
101 10.9.2.4
Looks like we've found our culprit.
Installing unliner
If you want to follow along with this tutorial, or start coding right
away, the easiest way to install unliner is with cpanminus:
curl -sL https://raw.github.com/miyagawa/cpanminus/master/cpanm | sudo perl - App::Unliner
You want it to do *what*?
Usually one-liners entered in your shell are thrown away after they are
used because it's so easy to re-create them as necessary. That's one
reason why unix pipes are so cool.
Besides, as soon as your pipelines reach a full line or two of text they
start to become very hard to work with (though I confess I've gotten a
lot of use out of crazy long pipelines before). At this point, usually
the one-liner is re-written as a "real" program.
The point of unliner is to provide an intermediate stage between a
one-liner and a real program. And you might even find that there is no
need to make it a real program after all.
To turn your one-liner into an unliner just wrap a "def main { }" around
it like this:
def main {
grep "GET /report.cgi" access.log | awk '{print $1}' | sort | uniq -c | sort -rn | head -n 5
}
If you save this in the file "log-report" then your unliner program can
be invoked with this command:
$ unliner log-report < input.txt
You could also put a shebang line
<https://en.wikipedia.org/wiki/Shebang_(Unix)> at the top of your
script:
#!/usr/bin/env unliner
Now if you "chmod +x log-report" you can run it directly:
$ ./log-report < input.txt
Defs
The "def main { }" isn't a special type of def except that it happens to
be what is called when your program is invoked. You can create other
defs and they can be invoked by your main def and other defs, kind of
like subroutines.
For example, we could move the "awk" command into a "ip-extractor" def,
and the tallying logic into a "tally" def:
def main {
grep "GET /report.cgi" access.log | ip-extractor | tally | head -n 5
}
def ip-extractor {
awk '{print $1}'
}
def tally {
sort | uniq -c | sort -rn
}
The same sequences of processes will be created with this program as
with the previous. However, defs let you organize and re-use pipeline
components better.
Arguments
The unliner program shown so far is not very flexible. For instance, the
"access.log" filename is hard-coded.
To fix this the arguments passed in to our log-report program are
available in the variable $@, just like in a shell script:
def main {
grep "GET /report.cgi" $@ | ip-extractor | tally | head -n 5
}
Now we can pass in a log file argument to our program (otherwise it will
read input from standard input):
$ unliner log-report access.log
Note that $@ escapes whitespace like bourne shell's "$@". Actually it
just passes the argument array untouched through to the process (grep in
this case) so the arguments can contain any characters. The bourne
equivalent of unquoted $@ and $* are not supported because they cause
way too many bugs (use templates if you need to do this).
We can parameterise other aspects of the unliner program too. For
example, suppose you wanted to control the number of lines that are
included in the report. To do this add a "prototype":
def main(head|h=i, junkarg=s) {
grep "GET /report.cgi" $@ | ip-extractor | tally | head -n $head
}
The prototype indicates that the main def requires arguments. Since the
main def is the entry-point, these arguments must come from the command
( run in 0.576 second using v1.01-cache-2.11-cpan-d8267643d1d )