App-Unliner

 view release on metacpan or  search on metacpan

lib/App/Unliner/Intro.pm  view on Meta::CPAN

The point of unliner is to provide an intermediate stage between a one-liner and a real program. And you might even find that there is no need to make it a real program after all.

To turn your one-liner into an unliner just wrap a C<def main { }> around it like this:

    def main {
      grep "GET /report.cgi" access.log | awk '{print $1}' | sort | uniq -c | sort -rn | head -n 5
    }

If you save this in the file C<log-report> then your unliner program can be invoked with this command:

    $ unliner log-report < input.txt

You could also put a L<shebang line|https://en.wikipedia.org/wiki/Shebang_(Unix)> at the top of your script:

    #!/usr/bin/env unliner

Now if you C<chmod +x log-report> you can run it directly:

    $ ./log-report < input.txt



=head2 Defs

The C<def main { }> isn't a special type of def except that it happens to be what is called when your program is invoked. You can create other defs and they can be invoked by your main def and other defs, kind of like subroutines.

For example, we could move the C<awk> command into a C<ip-extractor> def, and the tallying logic into a C<tally> def:

    def main {
      grep "GET /report.cgi" access.log | ip-extractor | tally | head -n 5
    }

    def ip-extractor {
      awk '{print $1}'
    }

    def tally {
      sort | uniq -c | sort -rn
    }

The same sequences of processes will be created with this program as with the previous. However, defs let you organize and re-use pipeline components better.





=head2 Arguments

The unliner program shown so far is not very flexible. For instance, the C<access.log> filename is hard-coded.

To fix this the arguments passed in to our log-report program are available in the variable C<$@>, just like in a shell script:

    def main {
      grep "GET /report.cgi" $@ | ip-extractor | tally | head -n 5
    }

Now we can pass in a log file argument to our program (otherwise it will read input from standard input):

    $ unliner log-report access.log

Note that $@ escapes whitespace like bourne shell's C<"$@">. Actually it just passes the argument array untouched through to the process (grep in this case) so the arguments can contain any characters. The bourne equivalent of unquoted C<$@> and C<$*...

We can parameterise other aspects of the unliner program too. For example, suppose you wanted to control the number of lines that are included in the report. To do this add a "prototype":

    def main(head|h=i, junkarg=s) {
      grep "GET /report.cgi" $@ | ip-extractor | tally | head -n $head
    }

The prototype indicates that the main def requires arguments. Since the main def is the entry-point, these arguments must come from the command line:

    $ unliner log-report access.log --head 5

C<head|h=i> is a L<Getopt::Long> argument definition. It means that the official name of this argument is C<head>, that there is a single-dash alias C<h>, and that the argument's "type" is required to be an integer. Because C<h> is an alias we could ...

    $ unliner log-report access.log -h 5

However, if you forget to add one of these arguments, the head process will die with an error like C<head: : invalid number of lines>.

Other common L<GetOpt::Long> argument types are string (ie C<hostname|h=s>) and boolean on/off switches that require no argument (ie C<flag|f>). 

In order to have a default value for a parameter, you put parentheses around the argument definition followed by the default value (just like lisp):

    def main((head|h=i 5)) {
      grep "GET /report.cgi" $@ | ip-extractor | tally | head -n $head
    }

None of these variables need to be quoted. They are always passed verbatim to the underlying command. If you do quote them, be aware that string interpolation is not implemented (use templates for that).

Defs internal to your program accept arguments in exactly the same way. You can think of internal defs as being their own mini command-line programs:

    def main {
      grep "GET /report.cgi" $@ | ip-extractor | tally | my-head -n 5
    }

    def my-head((n=i 10)) {
      head -n $n
    }


=head2 Argument pass-through and environment variables

Normally if you pass an argument into a def (from the command line or from another def) that isn't listed in the prototype, an "Unknown option" error will be thrown. This is the default L<Getopt::Long> behaviour. If you wish to suppress this error an...

    def main : pass-through {
      my-head $@
    }

    def my-head(count=i) {
      head -n $count
    }

C<pass-through> simply sets the C<pass_through> option of L<Getopt::Long>.

Environment variables that were given to the unliner process are present in your scripts as variables too. For example, this does what you'd expect:

    def main {
      echo $PATH
    }

But note that interpolating variables isn't (yet?) supported so C<echo "$PATH:/opt/bin"> won't work (use templates for that -- see below).

There is a def modifier called C<env> that allows you to install arguments into environment variables while invoking the def. This is useful for languages like C<perl> where access to environment variables is easier than parsing an argument list:

    def main((name=s 'Anonymous')) : perl : env {
      print "Hello, $ENV{name}\n";
    }





=head2 Def Modifiers

The contents of all the defs we've seen so far are in a custom unliner language called B<Shell>. You can add it if you want, but the C<: sh> def modifier is redundant because Shell is the default language.

Shell is mostly like bourne shell/bash but a little bit different. The differences are described in the distribution's TODO file. Some differences are deliberate and some are just features that haven't been implemented yet. One difference is that unl...

Def modifiers can be used to change how the def body is interpreted by changing the language to something other than Shell. Modifiers go in between the def name/prototype and the body. One language modifier that can be used is C<perl>. It causes the ...

    def body-size-extractor : perl {
      while (<STDIN>) {
        ## body size is the last field in the log
        print "$1\n" if /(\d+)$/;
      }
    }

This def could also have been written in sh, but dealing with shell escapes is sometimes annoying:

    def body-size-extractor {
      perl -e 'while(<STDIN>) { ... }'
    }

Def modifiers themselves sometimes take arguments. For example, perl defs can take the C<-n> switch which implicitly adds a loop (just like the perl binary):

    def body-size-extractor : perl -n {
      print "$1\n" if /(\d+)$/;
    }

Another supported language is python:

    def wrap-in-square-brackets : python {
      import sys

      for line in sys.stdin:
        line = line[:-1] # chop newline
        print "[" + line + "]"
    }

Note that python is very noisy when it receives a SIGPIPE so polite pipeline components should manually catch it and then exit silently.

A general-purpose "language" is exec. It is useful for running any command on your system, even when there are no such custom languages. As an example of exec usage, the following defs are equivalent:

    def second-column {
      awk -F, '{ print $2 }'
    }

    def second-column : exec awk -F, -f {
      { print $2 }
    }

Note that the C<-f> is required because awk doesn't follow the common scripting language convention where a program path is the first argument.

Github pull requests for new languages appreciated.




=head2 Templates

Another def modifier is C<template>. This modifier processes your def body with L<Template Toolkit|http://template-toolkit.org/> before it passes it on to whatever language type is specified. Because the template has access to the def's arguments, th...

Let's say we wanted to add a C<filter-localhost> switch to our log-report unliner that will exclude requests from localhost (127.0.0.1) from the tally. This can be accomplished with templates:

    def main((head|h=i 5), filter-localhost) : template {
      grep "GET /report.cgi" $@ |
      ip-extractor |

      [% IF filter_localhost %]  ## Note: - changes to _
        grep -v '^127\.0\.0\.1$' |
      [% END %]

      tally |
      head -n $head
    }

    def ip-extractor {
      awk '{print $1}'



( run in 2.348 seconds using v1.01-cache-2.11-cpan-97f6503c9c8 )