Net-Amazon-EMR
view release on metacpan or search on metacpan
lib/Net/Amazon/EMR.pm view on Meta::CPAN
# Flexible Booleans - 1, 0, undef, 'true', 'false'
$emr->set_visible_to_all_users(JobFlowIds => $id, VisibleToAllUsers => 1);
$emr->set_termination_protection(JobFlowIds => [ $id->JobFlowId ], TerminationProtected => 'false');
# Add map-reduce steps and execute
$emr->add_job_flow_steps(JobFlowId => $job_id,
Steps => [{
ActionOnFailure => 'CANCEL_AND_WAIT',
Name => "Example",
HadoopJarStep => {
Jar => '/home/hadoop/contrib/streaming/hadoop-streaming.jar',
Args => [ '-input', 's3://my-bucket/my-input',
'-output', 's3://my-bucket/my-output',
'-mapper', '/path/to/mapper-script',
'-reducer', '/path/to/reducer-script',
],
Properties => [ { Key => 'reduce_tasks_speculative_execution', Value => 'false' } ],
},
}, ... ]);
=head1 DESCRIPTION
lib/Net/Amazon/EMR.pm view on Meta::CPAN
cd $HOME
hadoop fs -get s3://$bucket/$arch/$tarfile .
tar xfz $tarfile
=item * Put the bootstrap script on S3 and use it when creating a new job flow.
=back
=head2 Mappers and Reducers
Assuming the reader is familiar with the basic principles of map-reduce, in terms of implementation in Perl with hadoop-streaming.jar, a mapper/reducer is simply a script that reads from STDIN and writes to STDOUT, typically line by line using a tab-...
while (my $line = <>) {
chomp $line;
my ($key, $value) = split(/\t/, @line);
... do something with key and value
print "$newkey\t$newvalue\n";
}
Scripts can be uploaded to S3 using the web interface, or placed in the bootstrap bundle described above, or uploaded to the master instance using scp and distributed using the hadoop-streaming.jar -file option, or no doubt by many other mechanisms. ...
Args => [ '-mapper', '"perl -e MyClass->new->mapper"', ... ]
=head1 AUTHOR
Jon Schutz
L<http://notes.jschutz.net>
=head1 BUGS
( run in 0.241 second using v1.01-cache-2.11-cpan-4d50c553e7e )