Acme-Tools
view release on metacpan or search on metacpan
=cut
sub urldec {
my $str=shift;
$str=~s/\+/ /gs;
$str=~s/%([a-f\d]{2})/pack("C", hex($1))/egi;
return $str;
}
=head2 ht2t
C<ht2t> is short for I<html-table to table>.
This sub extracts an html-C<< <table> >>s and returns its C<< <tr>s >>
and C<< <td>s >> as an array of arrayrefs. And strips away any html
inside the C<< <td>s >> as well.
my @table = ht2t($html,'some string occuring before the <table> you want');
Input: One or two arguments.
First argument: the html where a C<< <table> >> is to be found and converted.
Second argument: (optional) If the html contains more than one C<<
<table> >>, and you do not want the first one, applying a second
argument is a way of telling C<ht2t> which to capture: the one with this word
or string occurring before it.
Output: An array of arrayrefs.
C<ht2t()> is a quick and dirty way of scraping (or harvesting as it is
also called) data from a web page. Look too L<HTML::Parse> to do this
more accurate.
Example:
use Acme::Tools;
use LWP::Simple;
my $url = "http://en.wikipedia.org/wiki/List_of_countries_by_population";
for( ht2t( get($url), "Countries" ) ) {
my($rank, $country, $pop) = @$_;
$pop =~ s/,//g;
printf "%3d | %-32s | %9d\n", @$_ if $pop>0;
}
Output:
1 | China | 1367740000
2 | India | 1262090000
3 | United States | 319043000
4 | Indonesia | 252164800
5 | Brazil | 203404000
...and so on.
=cut
sub ht2t {
my($f,$s,$r)=@_; 1>@_||@_>3 and croak; $s='' if @_==1;
$f=~s,.*?($s).*?(<table.*?)</table.*,$2,si;
my $e=0;$e++ while index($f,$s=chr($e))>=$[;
$f=~s/<t(d|r|h).*?>/\l$1$s/gsi;
$f=~s/\s*<.*?>\s*/ /gsi;
my @t=split("r$s",$f);shift @t;
$r||=sub{s/&(#160|nbsp);/ /g;s/&/&/g;s/^\s*(.*?)\s*$/$1/s;
s/(\d) (\d)/$1$2/g if /^[\d \.\,]+$/};
for(@t){my @r=split/[dh]$s/;shift@r;$_=[map{&$r;$_}@r]}
@t;
}
=head1 FILES, DIRECTORIES
=head2 writefile
Justification:
Perl needs three or four operations to make a file out of a string:
open my $FILE, '>', $filename or die $!;
print $FILE $text;
close($FILE);
This is way simpler:
writefile($filename,$text);
Sub writefile opens the file i binary mode (C<binmode()>) and has two usage modes:
B<Input:> Two arguments
B<First argument> is the filename. If the file exists, its overwritten.
If the file can not be opened for writing, a die (a croak really) happens.
B<Second input argument> is one of:
=over 4
=item * Either a scaler. That is a normal string to be written to the file.
=item * Or a reference to a scalar. That referred text is written to the file.
=item * Or a reference to an array of scalars. This array is the written to the
file element by element and C<< \n >> is automatically appended to each element.
=back
Alternativelly, you can write several files at once.
Example, this:
writefile('file1.txt','The text....tjo');
writefile('file2.txt','The text....hip');
writefile('file3.txt','The text....and hop');
...is the same as this:
writefile([
['file1.txt','The text....tjo'],
['file2.txt','The text....hip'],
['file3.txt','The text....and hop'],
]);
( run in 2.906 seconds using v1.01-cache-2.11-cpan-5735350b133 )