streaming results from the CPAN

streaming

Net-Amazon-EMR

view release on metacpan or search on metacpan

lib/Net/Amazon/EMR.pm view on Meta::CPAN

  # Flexible Booleans - 1, 0, undef, 'true', 'false'
  $emr->set_visible_to_all_users(JobFlowIds => $id, VisibleToAllUsers => 1);
  $emr->set_termination_protection(JobFlowIds => [ $id->JobFlowId ], TerminationProtected => 'false');

  # Add map-reduce steps and execute
  $emr->add_job_flow_steps(JobFlowId => $job_id,
                           Steps => [{
            ActionOnFailure => 'CANCEL_AND_WAIT',
            Name => "Example",
            HadoopJarStep => { 
              Jar => '/home/hadoop/contrib/streaming/hadoop-streaming.jar',
              Args => [ '-input', 's3://my-bucket/my-input',
                        '-output', 's3://my-bucket/my-output',
                        '-mapper', '/path/to/mapper-script',
                        '-reducer', '/path/to/reducer-script',
                      ],
              Properties => [ { Key => 'reduce_tasks_speculative_execution', Value => 'false' } ],
              },
        }, ... ]);

=head1 DESCRIPTION

lib/Net/Amazon/EMR.pm view on Meta::CPAN

    cd $HOME
    hadoop fs -get s3://$bucket/$arch/$tarfile .
    tar xfz $tarfile

=item * Put the bootstrap script on S3 and use it when creating a new job flow.

=back

=head2 Mappers and Reducers

Assuming the reader is familiar with the basic principles of map-reduce, in terms of implementation in Perl with hadoop-streaming.jar, a mapper/reducer is simply a script that reads from STDIN and writes to STDOUT, typically line by line using a tab-...

    while (my $line = <>) {
      chomp $line;
      my ($key, $value) = split(/\t/, @line);
      ... do something with key and value
      print "$newkey\t$newvalue\n";
    }

Scripts can be uploaded to S3 using the web interface, or placed in the bootstrap bundle described above, or uploaded to the master instance using scp and distributed using the hadoop-streaming.jar -file option, or no doubt by many other mechanisms. ...

  Args => [ '-mapper', '"perl -e MyClass->new->mapper"', ... ]

=head1 AUTHOR

Jon Schutz

L<http://notes.jschutz.net>

=head1 BUGS

( run in 0.632 second using v1.01-cache-2.11-cpan-5dc5da66d9d )