App-jupiter
view release on metacpan or search on metacpan
1.08
- Strip emails from RSS 2.0 authors.
1.07
- Eliminating more stuff from the feed item extracts.
1.06
- atom:published takes precedence over atom:updated
- Ignore blog posts with dates in the future
1.05
- Fix dependencies
1.04
- Add filters to the command line so that you can update just one
feed, or a group of feeds.
## Adding just one feed
After a while, the list of feeds in your OPML starts getting unwieldy. When you
add a new feed, you might not want to fetch all of them. In this case, provide a
regular expression surrounded by slashes to the `update` command:
jupiter update feed.opml /example/
Assuming a feed with a URL or title that matches the regular expression is
listed in your OPML file, only that feed is going to get updated.
There is no need to escape slashes in the regular expression: `//rss/` works
just fine. Beware shell escaping, however. Most likely, you need to surround the
regular expression with single quotes if it contains spaces:
jupiter update feed.opml '/Halberds & Helmets/'
Notice how we assume that named entities such as `&` have already been
parsed into the appropriate strings.
## Generate the HTML
This is how you generate the `index.html` file based on the feeds of your
`feed.opml`. It assumes that you have already updated all the feeds (see
above).
jupiter html feed.opml
See ["OPTIONS"](#options) for ways to change how the HTML is generated.
## Generate the RSS feed
This happens at the same time as when you generate the HTML. It takes all the
entries that are being added to the HTML and puts the into a feed.
jupiter html osr.html osr.opml
jupiter html indie.html indie.opml
jupiter html rpg.html osr.opml indie.opml other.opml
For an example of how it might look, check out the setup for the planets I run.
[https://alexschroeder.ch/cgit/planet/about/](https://alexschroeder.ch/cgit/planet/about/)
## What about the JSON file?
There's a JSON file that gets generated and updated as you run Planet Jupiter.
It's name depends on the OPML files used. It records metadata for every feed in
the OPML file that isn't stored in the feeds themselves.
`message` is the HTTP status message, or a similar message such as "No entry
newer than 90 days." This is set when updating the feeds in your cache.
`message` is the HTTP status code; this code could be the real status code from
the server (such as 404 for a "not found" status) or one generated by Jupiter
such that it matches the status message (such as 206 for a "partial content"
status when there aren't any recent entries in the feed). This is set when
script/jupiter view on Meta::CPAN
=head2 Adding just one feed
After a while, the list of feeds in your OPML starts getting unwieldy. When you
add a new feed, you might not want to fetch all of them. In this case, provide a
regular expression surrounded by slashes to the C<update> command:
jupiter update feed.opml /example/
Assuming a feed with a URL or title that matches the regular expression is
listed in your OPML file, only that feed is going to get updated.
There is no need to escape slashes in the regular expression: C<//rss/> works
just fine. Beware shell escaping, however. Most likely, you need to surround the
regular expression with single quotes if it contains spaces:
jupiter update feed.opml '/Halberds & Helmets/'
Notice how we assume that named entities such as C<&> have already been
parsed into the appropriate strings.
=head2 Generate the HTML
This is how you generate the C<index.html> file based on the feeds of your
C<feed.opml>. It assumes that you have already updated all the feeds (see
above).
jupiter html feed.opml
See L</OPTIONS> for ways to change how the HTML is generated.
=head2 Generate the RSS feed
This happens at the same time as when you generate the HTML. It takes all the
entries that are being added to the HTML and puts the into a feed.
script/jupiter view on Meta::CPAN
jupiter html osr.html osr.opml
jupiter html indie.html indie.opml
jupiter html rpg.html osr.opml indie.opml other.opml
For an example of how it might look, check out the setup for the planets I run.
L<https://alexschroeder.ch/cgit/planet/about/>
=head2 What about the JSON file?
There's a JSON file that gets generated and updated as you run Planet Jupiter.
It's name depends on the OPML files used. It records metadata for every feed in
the OPML file that isn't stored in the feeds themselves.
C<message> is the HTTP status message, or a similar message such as "No entry
newer than 90 days." This is set when updating the feeds in your cache.
C<message> is the HTTP status code; this code could be the real status code from
the server (such as 404 for a "not found" status) or one generated by Jupiter
such that it matches the status message (such as 206 for a "partial content"
status when there aren't any recent entries in the feed). This is set when
script/jupiter view on Meta::CPAN
if (not @nodes) {
$feed->{message} = "Empty feed";
$feed->{code} = 204; # no content
next;
}
# if this is an Atom feed, we need to sort the entries ourselves (older entries at the end)
my @candidates = map {
my $entry = {};
$entry->{element} = $_;
$entry->{id} = id($_);
$entry->{date} = updated($_) || $undefined_date;
$entry;
} @nodes;
@candidates = grep { DateTime->compare($_->{date}, $now) <= 0 } @candidates;
@candidates = unique(sort { DateTime->compare( $b->{date}, $a->{date} ) } @candidates);
@candidates = @candidates[0 .. min($#candidates, $limit - 1)];
# now that we have limited the candidates, let's add more metadata from the feed
for my $entry (@candidates) {
$entry->{feed} = $feed;
# these two are already escaped
$entry->{blog_title} = $feed->{title};
script/jupiter view on Meta::CPAN
}
return \@entries;
}
sub add_age_warning {
my $feed = shift;
my $entries = shift;
my $date = shift;
# feed modification date is smaller than the date given
my ($node) = $xpc->findnodes("/rss/channel | /atom:feed", $feed->{doc});
my $feed_date = updated($node);
if ($feed_date and DateTime->compare($feed_date, $date) == -1) {
$feed->{message} = "No feed updates in 90 days";
$feed->{code} = 206; # partial content
return;
} else {
# or no entry found with a modification date equal or bigger than the date given
for my $entry (@$entries) {
return if DateTime->compare($entry->{date}, $date) >= 0;
}
$feed->{message} = "No entry newer than 90 days";
$feed->{code} = 206; # partial content
}
}
sub updated {
my $node = shift;
return unless $node;
my @nodes = $xpc->findnodes('pubDate | atom:published | atom:updated', $node) or return;
my $date = $nodes[0]->textContent;
my $dt = eval { DateTime::Format::Mail->parse_datetime($date) }
|| eval { DateTime::Format::ISO8601->parse_datetime($date) }
|| eval { DateTime::Format::Mail->parse_datetime(french($date)) };
return $dt;
}
sub french {
my $date = shift;
$date =~ s/^($wday_re)/$wday{$1}/;
share/template.html view on Meta::CPAN
<a href="<%= $feed->{link} =%>"><%= $feed->{title} =%></a>
% } elsif ($feed->{message} and (not $feed->{code} or $feed->{code} ne "200")) {
<a class="message" title="<%= $feed->{message} =%>"><%= $feed->{title} =%></a>
% } else {
%= $feed->{title}
% }
% }
</ul>
<h2>Info</h2>
<dl>
<dt>Last updated:</dt>
<dd><span class="date" title="GMT"><%= $globals->{date} %></span></dd>
<dt>Powered by:</dt>
<dd><a href="https://alexschroeder.ch/cgit/planet-jupiter/about/" class="jupiter button">Jupiter</a></dd>
<dt>Export:</dt>
<dd><a href="feed.xml" class="feed button">RSS</a></dd>
% for my $file (@{$globals->{files}}) {
<dd><a href="<%= $file->{file} %>" class="opml button"><%= $file->{name} %></a></dd>
% }
</dl>
</div>
save_opml('rss2sample.opml');
my $atom = <<'EOT';
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title type="text">dive into mark</title>
<subtitle type="html">
A <em>lot</em> of effort
went into making this effortless
</subtitle>
<updated>2005-07-31T12:29:29Z</updated>
<id>tag:example.org,2003:3</id>
<link rel="alternate" type="text/html"
hreflang="en" href="http://example.org/"/>
<link rel="self" type="application/atom+xml"
href="http://example.org/feed.atom"/>
<rights>Copyright (c) 2003, Mark Pilgrim</rights>
<generator uri="http://www.example.com/" version="1.0">
Example Toolkit
</generator>
<entry>
<title>Atom draft-07 snapshot</title>
<link rel="alternate" type="text/html"
href="http://example.org/2005/04/02/atom"/>
<link rel="enclosure" type="audio/mpeg" length="1337"
href="http://example.org/audio/ph34r_my_podcast.mp3"/>
<id>tag:example.org,2003:3.2397</id>
<updated>2005-07-31T12:29:29Z</updated>
<published>2003-12-13T08:29:29-04:00</published>
<author>
<name>Mark Pilgrim</name>
<uri>http://example.org/</uri>
<email>f8dy@example.com</email>
</author>
<contributor>
<name>Sam Ruby</name>
</contributor>
<contributor>
t/duplicate-links.t view on Meta::CPAN
<subtitle type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<ol><li><a href="/blog">/blog</a></li>
<li><a href="/blog/en">/blog/en</a></li>
<li><a href="/blog/rpg/">/blog/rpg/</a></li>
<li>/blog/rpg/en/en</li>
</ol>
</div>
</subtitle>
<link rel="self" href="https://idiomdrottning.org/blog/rpg/en/feed.xml"/>
<updated>2020-09-29T01:07:03+02:00</updated>
<id>https://idiomdrottning.org/blog/rpg/en/feed.xml</id>
<entry>
<link rel="self" href="https://idiomdrottning.org/converting-to-dnd/"/>
<id>https://idiomdrottning.org/converting-to-dnd/</id>
<title type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml">Converting to D&D</div></title>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
stuff
</div>
</content>
<updated>2020-09-29T00:18:12+02:00</updated>
<link href="https://idiomdrottning.org/converting-to-dnd/"/>
<author>
<name>Idiomdrottning</name>
<email>sandra.snan@idiomdrottning.org</email>
</author>
</entry>
</feed>
EOT
start_daemon(encode_utf8 $rss);
t/elements.t view on Meta::CPAN
my $doc = XML::LibXML->load_html(location => "test-$id/rss2sample.html");
is($doc->findvalue('//li/a[position()=2]'), "Elements", "Elements feed title matches");
is($doc->findvalue('//div[@class="content"]'), "I love the fediverse!", "Encoded content extracted");
use DateTime;
my $now = DateTime->now;
my $atom = <<"EOT";
<?xml version="1.0" encoding='UTF-8'?>
<feed xmlns='http://www.w3.org/2005/Atom'>
<updated>$now</updated>
<title type='text'>Textual</title>
<entry>
<updated>$now</updated>
<title type='text'>Current</title>
<summary type='text'>
Snail is best.
</summary>
</entry>
</feed>
EOT
start_daemon(encode_utf8 $atom);
t/elements.t view on Meta::CPAN
ok(-f "test-$id/rss2sample.html", "HTML was generated, again");
$doc = XML::LibXML->load_html(location => "test-$id/rss2sample.html");
is($doc->findvalue('//div[@class="content"]'), "Snail is best.", "Text content extracted");
ok(!$doc->findvalue('//li/a[@class="message"]'), "Message is empty in the info list");
my $old = '2018-12-01T04:24:13.964-06:00';
$atom = <<"EOT";
<?xml version="1.0" encoding='UTF-8'?>
<feed xmlns='http://www.w3.org/2005/Atom'>
<updated>$old</updated>
<title type='text'>Textual</title>
<entry>
<updated>$now</updated>
<title type='text'>Current</title>
<summary type='text'>
Snail is best.
</summary>
</entry>
</feed>
EOT
start_daemon(encode_utf8 $atom);
t/elements.t view on Meta::CPAN
$doc = XML::LibXML->load_html(location => "test-$id/rss2sample.html");
is($doc->findvalue('//li/a[@class="message"]/@title'), "No feed updates in 90 days", "No feed updates in 90 days");
$atom = <<"EOT";
<?xml version="1.0" encoding='UTF-8'?>
<feed xmlns='http://www.w3.org/2005/Atom'>
<!-- no update given for the feed itself -->
<title type='text'>Textual</title>
<entry>
<updated>$old</updated>
<title type='text'>Current</title>
<summary type='text'>
Snail is best.
</summary>
</entry>
</feed>
EOT
start_daemon(encode_utf8 $atom);
t/encoding-atom.t view on Meta::CPAN
do './t/test.pl';
my ($id, $port) = init();
save_opml('rss2sample.opml');
use DateTime;
my $now = DateTime->now;
my $atom = <<'EOT';
<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns='http://www.w3.org/2005/Atom'>
<updated>$now</updated>
<title type='text'>Schröderâs Blog</title>
<author><name>Alex Schröder</name><email>noreply@blogger.com</email></author>
<entry>
<published>$now</published>
<updated>$now</updated>
<title type='text'>FuÃ</title>
<content type='html'>Hello Schröder!</content>
</entry>
</feed>
EOT
start_daemon(encode_utf8 $atom);
Jupiter::update_cache("test-$id/rss2sample.opml");
<entry><pubDate>lun., 01 juin 2020 10:24:02 +0200</pubDate></entry>
<entry><pubDate>dim., 24 mai 2020 15:41:37 +0200</pubDate></entry>
<entry><pubDate>mar., 19 mai 2020 11:53:34 +0200</pubDate></entry>
<entry><pubDate>lun., 11 mai 2020 11:58:10 +0200</pubDate></entry>
<entry><pubDate>lun., 11 mai 2020 11:58:06 +0200</pubDate></entry>
<entry><pubDate>lun., 04 mai 2020 19:38:33 +0200</pubDate></entry>
<entry><pubDate>mer., 29 avr. 2020 16:02:06 +0200</pubDate></entry>
</test>
EOT
for ($doc->findnodes("//entry")) {
ok(updated($_), $_->textContent);
}
t/sorting.t view on Meta::CPAN
use Test::More;
use File::Slurper qw(write_binary read_binary write_text);
do './t/test.pl';
my ($id, $port) = init();
save_opml('rss2sample.opml');
my $rss = <<'EOT';
<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns='http://www.w3.org/2005/Atom'>
<updated>2021-12-20T10:04:41.088-07:00</updated>
<title>Simulacrum</title>
<entry>
<published>3000-12-16T16:17:00.130-07:00</published>
<updated>3000-12-19T23:50:29.172-07:00</updated>
<title>From the far future</title>
<author><name>Keith</name></author>
</entry>
<entry>
<published>2021-12-16T16:17:00.130-07:00</published>
<updated>2021-12-19T23:50:29.172-07:00</updated>
<title>Part V</title>
<author><name>Keith</name></author>
</entry>
<entry>
<published>2021-03-15T21:08:00.118-06:00</published>
<updated>2021-12-19T10:56:43.136-07:00</updated>
<title>Part IV</title>
<author><name>Keith</name></author>
</entry>
<entry>
<published>2021-02-27T01:40:00.043-07:00</published>
<updated>2021-07-09T21:51:10.001-06:00</updated>
<title>Part III</title>
<author><name>Keith</name></author>
</entry>
<entry>
<published>2021-02-13T02:09:00.042-07:00</published>
<updated>2021-12-19T01:29:30.953-07:00</updated>
<title>Part II</title>
<author><name>Keith</name></author>
</entry>
<entry>
<published>2021-02-05T14:09:00.104-07:00</published>
<updated>2021-06-24T19:26:08.179-06:00</updated>
<title>Part I</title>
<author><name>Keith</name></author>
</entry>
</feed>
EOT
start_daemon(encode_utf8 $rss);
Jupiter::update_cache("test-$id/rss2sample.opml");
( run in 0.425 second using v1.01-cache-2.11-cpan-05444aca049 )