rss2leafnode
view release on metacpan or search on metacpan
rss2leafnode view on Meta::CPAN
=item C<fetch_rss ("group", "url", render =E<gt> R)>
=item C<fetch_html ("group", "url", render =E<gt> R)>
If true then render HTML as plain text in the news messages. Normally item
text, downloaded parts from C<$rss_get_links>, and C<fetch_html()> pages are
all presented as C<text/html>. If your newsreader doesn't handle HTML very
well then C<render> is a good way to see just the text. Setting C<1> uses
C<HTML::FormatText>,
$render = 1; # to use HTML::FormatText
fetch_rss ('r2l.weather',
'http://xml.weather.yahoo.com/forecastrss?p=ASXX0001&u=f');
Setting C<"WithLinks"> uses the C<HTML::FormatText::WithLinks> variant (you
must have that module) which shows HTML links as footnotes.
fetch_rss ('r2l.stuff',
'http://rss.sciam.com/sciam/basic-science',
render => 'WithLinks');
=cut
# HTML::FormatText::Elinks yes
# HTML::FormatText::Html2text no output_charset
# HTML::FormatText::Lynx yes
# HTML::FormatText::Links no UTF-8 output_charset
# HTML::FormatText::Netrik no UTF-8 output as such
# HTML::FormatText::Vilistextum yes if built with UTF-8
# HTML::FormatText::W3m yes
# HTML::FormatText::Zen no, latin-1 only
=pod
Settings C<"elinks">, C<"lynx"> or C<"w3m">use the respective external
program. You must have C<HTML::FormatExternal> and the program.
fetch_rss ('r2l.sport',
'http://fr.news.yahoo.com/rss/rugby.xml',
rss_get_links => 1,
render => 'lynx');
C<"vilistextum"> can be used too if it is built with C<--enable-multibyte>
for UTF-8 output. Other C<HTML::FormatExternal> programs generally can't be
used as they don't have output charset UTF-8.
=item $render_width (default 60)
The number of columns to use when rendering HTML to plain text or when
wrapping Atom text. You can set this to whatever you find easiest to read,
or any special width needed by a particular feed.
=item $get_icon (default 0)
=item C<fetch_rss ("group", "url", get_icon =E<gt> G)>
=item C<fetch_html ("group", "url", get_icon =E<gt> G)>
Download an RSS/Atom icon or HTML favicon as an image for the C<Face>
header. C<Image::Magick> is required for image processing if not already
PNG format and maximum size 48x48 (in size attributes).
The C<Face> header is shown by Gnus and perhaps only a few other
news readers. In Gnus it appears with "From:" in article mode on a
graphical screen. It can be a good visual cue to the origin, but may not
always be worth the extra download.
$get_icon = 1;
fetch_rss ('r2l.whatsnew',
'http://www.archive.org/services/collection-rss.php');
Banners much wider than high are suppressed as probably advertising and
anyway not suited to 48x48 size limit of the C<Face> header specification.
A 48x48 image might add around 4 kbytes or more to each message.
For plain RSS and Atom feeds an image is normally a per-channel attribute so
it's the same for all articles from the feed. An C<itunes:image> or
C<activity:actor> can be per-item and is used if present.
=item $rss_newest_only (default 0 for all)
=item C<fetch_rss ("group", "url", rss_newest_only =E<gt> $count)>
=item C<fetch_rss ("group", "url", rss_newest_only =E<gt> $period)>
Take only newest items from an RSS feed. The default is 0 which means take
all items from the feed. The value is either a number for the latest few
items, eg. 10 items,
fetch_rss('r2l.test',
'http://www.cpantesters.org/author/K/KRYDE-nopass.rss',
rss_newest_only => 10);
Or it can be a string giving a period of time. Only items newer than this
are taken
"60 minutes"
"1 hour" "36 hours"
"1 day" "2 days"
"1 day" "2 days"
"1 month" "5 months"
"1 year" "0.75 years"
C<rss_newest_only> can be good if you're only interested in the most recent
item from a status or weather feed, or if you only want to get a few items
as a random taste of a feed.
If a feed goes back further than the news server retains then giving a
period such as "90 days" or whatever corresponding to the server time will
prevent old articles being re-added when the server discards them. (It'd be
better if the news server could be asked for its retention time, but this
option here is better than nothing.)
=item $html_extract_main (default 0, false)
=item C<fetch_html ("group", "url", html_extract_main =E<gt> 1)>
=item C<fetch_rss ("group", "url", html_extract_main =E<gt> 1)>
Use L<HTML::ExtractMain> on downloaded HTML to pick out the "main" text from
rss2leafnode view on Meta::CPAN
The way C<$rss_get_links> only gets the immediate link target could perhaps
be extended to fetch images or frame sub-parts etc of a HTML page and
include them in the message as RFC 2557 style "MHTML". But do any news
readers actually display that?
Perhaps there should be a limit on the size of links to be downloaded.
Sometimes podcast links have both a html page and a full audio link. If the
audio is bigger than some threshold then might like to download the html but
not the audio.
The entire XML feed is read into memory, which might be a little too much
for large feeds. RSS was conceived as a "site summary" but is used for
bigger content too. Twig has a partial-tree parse for one item at a time,
though applying the C<rss_newest_only> option would require a first pass to
choose items. A progressive parse might help show the first few items if
there's a fatal syntax error or truncation part-way through. Some care
would be needed that small changes by the automated charset recoding or by
C<XML::Liberal> doesn't cause duplicated posts.
=head1 ENVIRONMENT VARIABLES
=over 4
=item C<NNTPSERVER>
=item C<NEWSHOST>
Default news server as per C<Net::NNTP>. If unset then C<localhost> is
used.
=back
=head1 FILES
=over 4
=item F<~/.rss2leafnode.conf>
Configuration file.
=item F<~/.rss2leafnode.status>
Status file, recording "last modified" dates for downloads. This can be
deleted if something bad seems to have happened to it; the next
C<rss2leafnode> run will recreate it.
=item F</etc/perl/Net/libnet.cfg>
=item F<~/.libnet.cfg>
Defaults per C<Net::NNTP> and C<Net::Config>.
=back
=head1 SEE ALSO
L<leafnode(8)>,
L<HTML::FormatText>, L<HTML::FormatText::WithLinks>, L<HTML::FormatExternal>,
L<lynx(1)>,
L<URI::Title>, L<XML::Parser>, L<XML::Liberal>, L<Image::Magick>,
C<Net::NNTP>, C<Net::Config>
L<Plagger>, L<feed2imap(1)>, L<rss2email(1)>, L<rssdrop(1)>, L<toursst(1)>,
L<http://www.gwene.org>
=head1 HOME PAGE
L<http://user42.tuxfamily.org/rss2leafnode/index.html>
=head1 LICENSE
Copyright 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2017 Kevin Ryde
RSS2Leafnode is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the Free
Software Foundation; either version 3, or (at your option) any later
version.
RSS2Leafnode is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
more details.
You should have received a copy of the GNU General Public License along with
RSS2Leafnode. If not, see L<http://www.gnu.org/licenses/>.
=cut
( run in 1.612 second using v1.01-cache-2.11-cpan-39bf76dae61 )