Apache-Hadoop-WebHDFS

 view release on metacpan or  search on metacpan

lib/Apache/Hadoop/WebHDFS.pm  view on Meta::CPAN


    if ( $self->{'webhdfstoken'} ) {
        $url = $url . "&delegation=" . $self->{'webhdfstoken'};
    }
    $self->put( $url );
}


=pod

=head1 NAME

Apache::Hadoop::WebHDFS - interface to Hadoop's WebHDS API that supports GSSAPI/SPNEGO (secure) access.

=head1 VERSION

Version 0.04

=head1 SYNOPSIS

Hadoop's WebHDFS API, is a rest interface to HDFS.  This module provides 
a perl interface to the API, allowing one to both read and write files to 
HDFS.  Because Apache::Hadoop::WebHDFS supports GSSAPI, it can be used to 
interface with secure Hadoop Clusters.  This module also supports WebHDFS connections
with unsecure grids   

Apache::Hadoop::WebHDFS is a subclass of WWW:Mechanize, so one could 
reference WWW::Mechanize methods if needed.  One will note that 
WWW::Mechanize is a subclass of LWP, meaning it's possible to also reference 
LWP methods from Apache::Hadoop::WebHDFS.  For example to debug the GSSAPI
calls used during the request, enable LWP::Debug by adding 'use LWP::Debug qw(+);' to your script.

Content returned from WebHDFS is left in the native JSON format. Including your favorite JSON module like JSON::Any 
will help with mangaging the JSON output.   To get access to the content stored in your Apache::Hadoop::WebHDFS object,
use the methods provided by WWW::Mechanize, such as 'success', 'status', and 'content'.  Please see 'EXAMPLE' below
for how this is used.


=head1 METHODS 

=over 3

=item * new() - creates a new WebHDFS object. Required keys are 'user', 'namenode', 'namenodeport', and 'authmethod'.  Default values for 'namenode' and 'namenodeport' are listed below. The default value for authmethod is 'gssapi', which is used on g...
         
       my $hdfsclient =  new({ namenode     => "localhost",
                               namenodeport => "50070",
                               authmethod   => "gssapi|unsecure|doas",
                               user         => 'user1',
                               doasuser     => 'user2',
                             });      
 

=item * getdelegationtoken() - gets a delegation token from the namenode.  This token is stored within the WebHDFS object and automatically appended to each WebHDFS request.   Delegation tokens are used on grids with security enabled.
    
       $hdfsclient->getdelegationtoken();

=item * renewdelegationtoken()  - renews a delegation token from the namenode. 

       $hdfsclient->renewdelegationtoken();

=item * canceldelegationtoken() - informs the namenode to invalidate the delegation token as it's no longer needed.   When calling this method, the delegation token is also removed from the perl WebHDFS object.

       $hdfsclient->canceldelegationtoken();

=item * Open() - opens file on HDFS and returns it's content The only required value for Open() is 'file', all others are optional.  The values, 'offset', 'length', and 'buffersize', are meant to be sized in bytes.

        $hdfsclient->Open({ file=>'/path/to/my/hdfs/file',
                            offset=>'1024',    
                            length=>'2048',
                            buffersize=>'1024',
                           });

=item * create() - creates and writes to a file on HDFS Required values for create are 'srcfile' which is local, and dstfile which is the path for the new file on HDFS.  'blocksize' is represented in bytes and 'overwrite' has two valid values of 'tru...

         $hdfsclient->create({ srcfile=>'/my/local/file.txt',
                               dstfile=>'/my/hdfs/location/file.txt',
                               blocksize=>'524288',
                               replication=>'3',
                               buffersize=>'1024',
                               overwrite=>'true|false',
                               permission=>'644',
                              });

=item * rename()  - renames a file on HDFS.  Required values for rename are 'srcfile' and 'dstfile', both of which represent HDFS filenames.
  
         $hdfsclient->rename({ srcfile=>'/my/old/hdfs/file.txt',
                               dstfile=>'my/new/hdfs/file.txt',
                             });

=item * getfilestatus() - returns a json structure containing status of file or directory.  Required input is a HDFS path.
   
         $hdfsclient->getfilestatus({ file=>'/path/to/my/hdfs/file.txt' });

=item * liststatus() - returns a json structure of contents inside a directory.  Note the timestamps are java timestamps so divide by 1000 to convert to ctime before attempting to format time value.
   
         $hdfsclient->liststatus({ path=>'/path/to/my/hdfs/directory' });

=item *  mkdirs() - creates a directory on HDFS.  The only required input value is path.  Their is an optional input value named permissions and if not provided will default to '000'.

         $hdfsclient->mkdirs({ path=>'/path/to/my/hdfs/directory',
                               permissions=>'755', 
          });

=item * getfilechecksum() - gets HDFS checksum on file.  Note this is the crc32 checksum that HDFS uses to detect file corruption. It's not the checksum of the file itself.  The only required input value is 'file'.

         $hdfsclient->getfilechecksum({ file=>'/path/to/my/hdfs/directory' });

=item * Delete() - removes file or directories from HDFS.  The only required input value is 'path'.  The other optional value is 'recursive' which takes a 'true|false' arguement.

         $hdfsclient->Delete({ path=>'/path/to/my/hdfs/directory',
                               recursive=>'true|false',
         });

=item * getcontentsummary() - list metadata information on a directory. This includes things like file count and quota usage for that directory.   The only input value is a path to a HDFS directory.

         $hdfsclient->getcontentsummary({ directory=>'/path/to/my/hdfs/directory' });

=item * getfilestatus() - returns access times, blocksize, and permissions on a HDFS file.

         $hdfsclient->getfilestatus({ file=>'/path/to/my/hdfs/file' });



( run in 2.240 seconds using v1.01-cache-2.11-cpan-39bf76dae61 )