updated results from the CPAN

updated

App-jupiter

view release on metacpan or search on metacpan

1.08

- Strip emails from RSS 2.0 authors.

1.07

- Eliminating more stuff from the feed item extracts.

1.06

- atom:published takes precedence over atom:updated
- Ignore blog posts with dates in the future

1.05

- Fix dependencies

1.04

- Add filters to the command line so that you can update just one
  feed, or a group of feeds.

README.md view on Meta::CPAN


## Adding just one feed

After a while, the list of feeds in your OPML starts getting unwieldy. When you
add a new feed, you might not want to fetch all of them. In this case, provide a
regular expression surrounded by slashes to the `update` command:

    jupiter update feed.opml /example/

Assuming a feed with a URL or title that matches the regular expression is
listed in your OPML file, only that feed is going to get updated.

There is no need to escape slashes in the regular expression: `//rss/` works
just fine. Beware shell escaping, however. Most likely, you need to surround the
regular expression with single quotes if it contains spaces:

    jupiter update feed.opml '/Halberds & Helmets/'

Notice how we assume that named entities such as `&amp;` have already been
parsed into the appropriate strings.

## Generate the HTML

This is how you generate the `index.html` file based on the feeds of your
`feed.opml`. It assumes that you have already updated all the feeds (see
above).

    jupiter html feed.opml

See ["OPTIONS"](#options) for ways to change how the HTML is generated.

## Generate the RSS feed

This happens at the same time as when you generate the HTML. It takes all the
entries that are being added to the HTML and puts the into a feed.

README.md view on Meta::CPAN


    jupiter html osr.html osr.opml
    jupiter html indie.html indie.opml
    jupiter html rpg.html osr.opml indie.opml other.opml

For an example of how it might look, check out the setup for the planets I run.
[https://alexschroeder.ch/cgit/planet/about/](https://alexschroeder.ch/cgit/planet/about/)

## What about the JSON file?

There's a JSON file that gets generated and updated as you run Planet Jupiter.
It's name depends on the OPML files used. It records metadata for every feed in
the OPML file that isn't stored in the feeds themselves.

`message` is the HTTP status message, or a similar message such as "No entry
newer than 90 days." This is set when updating the feeds in your cache.

`message` is the HTTP status code; this code could be the real status code from
the server (such as 404 for a "not found" status) or one generated by Jupiter
such that it matches the status message (such as 206 for a "partial content"
status when there aren't any recent entries in the feed). This is set when

script/jupiter view on Meta::CPAN


=head2 Adding just one feed

After a while, the list of feeds in your OPML starts getting unwieldy. When you
add a new feed, you might not want to fetch all of them. In this case, provide a
regular expression surrounded by slashes to the C<update> command:

    jupiter update feed.opml /example/

Assuming a feed with a URL or title that matches the regular expression is
listed in your OPML file, only that feed is going to get updated.

There is no need to escape slashes in the regular expression: C<//rss/> works
just fine. Beware shell escaping, however. Most likely, you need to surround the
regular expression with single quotes if it contains spaces:

    jupiter update feed.opml '/Halberds & Helmets/'

Notice how we assume that named entities such as C<&amp;> have already been
parsed into the appropriate strings.

=head2 Generate the HTML

This is how you generate the C<index.html> file based on the feeds of your
C<feed.opml>. It assumes that you have already updated all the feeds (see
above).

    jupiter html feed.opml

See L</OPTIONS> for ways to change how the HTML is generated.

=head2 Generate the RSS feed

This happens at the same time as when you generate the HTML. It takes all the
entries that are being added to the HTML and puts the into a feed.

script/jupiter view on Meta::CPAN


    jupiter html osr.html osr.opml
    jupiter html indie.html indie.opml
    jupiter html rpg.html osr.opml indie.opml other.opml

For an example of how it might look, check out the setup for the planets I run.
L<https://alexschroeder.ch/cgit/planet/about/>

=head2 What about the JSON file?

There's a JSON file that gets generated and updated as you run Planet Jupiter.
It's name depends on the OPML files used. It records metadata for every feed in
the OPML file that isn't stored in the feeds themselves.

C<message> is the HTTP status message, or a similar message such as "No entry
newer than 90 days." This is set when updating the feeds in your cache.

C<message> is the HTTP status code; this code could be the real status code from
the server (such as 404 for a "not found" status) or one generated by Jupiter
such that it matches the status message (such as 206 for a "partial content"
status when there aren't any recent entries in the feed). This is set when

script/jupiter view on Meta::CPAN

    if (not @nodes) {
      $feed->{message} = "Empty feed";
      $feed->{code} = 204; # no content
      next;
    }
    # if this is an Atom feed, we need to sort the entries ourselves (older entries at the end)
    my @candidates = map {
      my $entry = {};
      $entry->{element} = $_;
      $entry->{id} = id($_);
      $entry->{date} = updated($_) || $undefined_date;
      $entry;
    } @nodes;
    @candidates = grep { DateTime->compare($_->{date}, $now) <= 0 } @candidates;
    @candidates = unique(sort { DateTime->compare( $b->{date}, $a->{date} ) } @candidates);
    @candidates = @candidates[0 .. min($#candidates, $limit - 1)];
    # now that we have limited the candidates, let's add more metadata from the feed
    for my $entry (@candidates) {
      $entry->{feed} = $feed;
      # these two are already escaped
      $entry->{blog_title} = $feed->{title};

script/jupiter view on Meta::CPAN

  }
  return \@entries;
}

sub add_age_warning {
  my $feed = shift;
  my $entries = shift;
  my $date = shift;
  # feed modification date is smaller than the date given
  my ($node) = $xpc->findnodes("/rss/channel | /atom:feed", $feed->{doc});
  my $feed_date = updated($node);
  if ($feed_date and DateTime->compare($feed_date, $date) == -1) {
    $feed->{message} = "No feed updates in 90 days";
    $feed->{code} = 206; # partial content
    return;
  } else {
    # or no entry found with a modification date equal or bigger than the date given
    for my $entry (@$entries) {
      return if DateTime->compare($entry->{date}, $date) >= 0;
    }
    $feed->{message} = "No entry newer than 90 days";
    $feed->{code} = 206; # partial content
  }
}

sub updated {
  my $node = shift;
  return unless $node;
  my @nodes = $xpc->findnodes('pubDate | atom:published | atom:updated', $node) or return;
  my $date = $nodes[0]->textContent;
  my $dt = eval { DateTime::Format::Mail->parse_datetime($date) }
  || eval { DateTime::Format::ISO8601->parse_datetime($date) }
  || eval { DateTime::Format::Mail->parse_datetime(french($date)) };
  return $dt;
}

sub french {
  my $date = shift;
  $date =~ s/^($wday_re)/$wday{$1}/;

share/template.html view on Meta::CPAN

<a href="<%= $feed->{link} =%>"><%= $feed->{title} =%></a>
%   } elsif ($feed->{message} and (not $feed->{code} or $feed->{code} ne "200")) {
<a class="message" title="<%= $feed->{message} =%>"><%= $feed->{title} =%></a>
%   } else {
%= $feed->{title}
%   }
% }
    </ul>
    <h2>Info</h2>
    <dl>
      <dt>Last updated:</dt>
      <dd><span class="date" title="GMT"><%= $globals->{date} %></span></dd>
      <dt>Powered by:</dt>
      <dd><a href="https://alexschroeder.ch/cgit/planet-jupiter/about/" class="jupiter button">Jupiter</a></dd>
      <dt>Export:</dt>
      <dd><a href="feed.xml" class="feed button">RSS</a></dd>
% for my $file (@{$globals->{files}}) {
      <dd><a href="<%= $file->{file} %>" class="opml button"><%= $file->{name} %></a></dd>
% }
    </dl>
  </div>

t/atom.t view on Meta::CPAN

save_opml('rss2sample.opml');

my $atom = <<'EOT';
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title type="text">dive into mark</title>
  <subtitle type="html">
    A &lt;em&gt;lot&lt;/em&gt; of effort
    went into making this effortless
  </subtitle>
  <updated>2005-07-31T12:29:29Z</updated>
  <id>tag:example.org,2003:3</id>
  <link rel="alternate" type="text/html"
   hreflang="en" href="http://example.org/"/>
  <link rel="self" type="application/atom+xml"
   href="http://example.org/feed.atom"/>
  <rights>Copyright (c) 2003, Mark Pilgrim</rights>
  <generator uri="http://www.example.com/" version="1.0">
    Example Toolkit
  </generator>
  <entry>
    <title>Atom draft-07 snapshot</title>
    <link rel="alternate" type="text/html"
     href="http://example.org/2005/04/02/atom"/>
    <link rel="enclosure" type="audio/mpeg" length="1337"
     href="http://example.org/audio/ph34r_my_podcast.mp3"/>
    <id>tag:example.org,2003:3.2397</id>
    <updated>2005-07-31T12:29:29Z</updated>
    <published>2003-12-13T08:29:29-04:00</published>
    <author>
      <name>Mark Pilgrim</name>
      <uri>http://example.org/</uri>
      <email>f8dy@example.com</email>
    </author>
    <contributor>
      <name>Sam Ruby</name>
    </contributor>
    <contributor>

t/duplicate-links.t view on Meta::CPAN

  <subtitle type="xhtml">
    <div xmlns="http://www.w3.org/1999/xhtml">
      <ol><li><a href="/blog">/blog</a></li>
      <li><a href="/blog/en">/blog/en</a></li>
      <li><a href="/blog/rpg/">/blog/rpg/</a></li>
      <li>/blog/rpg/en/en</li>
</ol>
    </div>
  </subtitle>
  <link rel="self" href="https://idiomdrottning.org/blog/rpg/en/feed.xml"/>
  <updated>2020-09-29T01:07:03+02:00</updated>
  <id>https://idiomdrottning.org/blog/rpg/en/feed.xml</id>
  <entry>
    <link rel="self" href="https://idiomdrottning.org/converting-to-dnd/"/>
    <id>https://idiomdrottning.org/converting-to-dnd/</id>
    <title type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml">Converting to D&amp;D</div></title>
    <content type="xhtml">
      <div xmlns="http://www.w3.org/1999/xhtml">
        stuff
      </div>
    </content>
    <updated>2020-09-29T00:18:12+02:00</updated>
    <link href="https://idiomdrottning.org/converting-to-dnd/"/>
    <author>
      <name>Idiomdrottning</name>
      <email>sandra.snan@idiomdrottning.org</email>
    </author>
    </entry>
</feed>
EOT

start_daemon(encode_utf8 $rss);

t/elements.t view on Meta::CPAN

my $doc = XML::LibXML->load_html(location => "test-$id/rss2sample.html");
is($doc->findvalue('//li/a[position()=2]'), "Elements", "Elements feed title matches");
is($doc->findvalue('//div[@class="content"]'), "I love the fediverse!", "Encoded content extracted");

use DateTime;
my $now = DateTime->now;

my $atom = <<"EOT";
<?xml version="1.0" encoding='UTF-8'?>
<feed xmlns='http://www.w3.org/2005/Atom'>
<updated>$now</updated>
<title type='text'>Textual</title>
<entry>
<updated>$now</updated>
<title type='text'>Current</title>
<summary type='text'>
Snail is best.
</summary>
</entry>
</feed>
EOT

start_daemon(encode_utf8 $atom);

t/elements.t view on Meta::CPAN

ok(-f "test-$id/rss2sample.html", "HTML was generated, again");
$doc = XML::LibXML->load_html(location => "test-$id/rss2sample.html");
is($doc->findvalue('//div[@class="content"]'), "Snail is best.", "Text content extracted");
ok(!$doc->findvalue('//li/a[@class="message"]'), "Message is empty in the info list");

my $old = '2018-12-01T04:24:13.964-06:00';

$atom = <<"EOT";
<?xml version="1.0" encoding='UTF-8'?>
<feed xmlns='http://www.w3.org/2005/Atom'>
<updated>$old</updated>
<title type='text'>Textual</title>
<entry>
<updated>$now</updated>
<title type='text'>Current</title>
<summary type='text'>
Snail is best.
</summary>
</entry>
</feed>
EOT

start_daemon(encode_utf8 $atom);

t/elements.t view on Meta::CPAN


$doc = XML::LibXML->load_html(location => "test-$id/rss2sample.html");
is($doc->findvalue('//li/a[@class="message"]/@title'), "No feed updates in 90 days", "No feed updates in 90 days");

$atom = <<"EOT";
<?xml version="1.0" encoding='UTF-8'?>
<feed xmlns='http://www.w3.org/2005/Atom'>
<!-- no update given for the feed itself -->
<title type='text'>Textual</title>
<entry>
<updated>$old</updated>
<title type='text'>Current</title>
<summary type='text'>
Snail is best.
</summary>
</entry>
</feed>
EOT

start_daemon(encode_utf8 $atom);

t/encoding-atom.t view on Meta::CPAN

do './t/test.pl';
my ($id, $port) = init();
save_opml('rss2sample.opml');

use DateTime;
my $now = DateTime->now;

my $atom = <<'EOT';
<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns='http://www.w3.org/2005/Atom'>
<updated>$now</updated>
<title type='text'>SchrÃ¶derâ€™s Blog</title>
<author><name>Alex SchrÃ¶der</name><email>noreply@blogger.com</email></author>
<entry>
<published>$now</published>
<updated>$now</updated>
<title type='text'>FuÃŸ</title>
<content type='html'>Hello SchrÃ¶der!</content>
</entry>
</feed>
EOT

start_daemon(encode_utf8 $atom);

Jupiter::update_cache("test-$id/rss2sample.opml");

t/french.t view on Meta::CPAN

<entry><pubDate>lun., 01 juin 2020 10:24:02 +0200</pubDate></entry>
<entry><pubDate>dim., 24 mai 2020 15:41:37 +0200</pubDate></entry>
<entry><pubDate>mar., 19 mai 2020 11:53:34 +0200</pubDate></entry>
<entry><pubDate>lun., 11 mai 2020 11:58:10 +0200</pubDate></entry>
<entry><pubDate>lun., 11 mai 2020 11:58:06 +0200</pubDate></entry>
<entry><pubDate>lun., 04 mai 2020 19:38:33 +0200</pubDate></entry>
<entry><pubDate>mer., 29 avr. 2020 16:02:06 +0200</pubDate></entry>
</test>
EOT
for ($doc->findnodes("//entry")) {
  ok(updated($_), $_->textContent);
}

t/sorting.t view on Meta::CPAN

use Test::More;
use File::Slurper qw(write_binary read_binary write_text);

do './t/test.pl';
my ($id, $port) = init();
save_opml('rss2sample.opml');

my $rss = <<'EOT';
<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns='http://www.w3.org/2005/Atom'>
<updated>2021-12-20T10:04:41.088-07:00</updated>
<title>Simulacrum</title>
<entry>
<published>3000-12-16T16:17:00.130-07:00</published>
<updated>3000-12-19T23:50:29.172-07:00</updated>
<title>From the far future</title>
<author><name>Keith</name></author>
</entry>
<entry>
<published>2021-12-16T16:17:00.130-07:00</published>
<updated>2021-12-19T23:50:29.172-07:00</updated>
<title>Part V</title>
<author><name>Keith</name></author>
</entry>
<entry>
<published>2021-03-15T21:08:00.118-06:00</published>
<updated>2021-12-19T10:56:43.136-07:00</updated>
<title>Part IV</title>
<author><name>Keith</name></author>
</entry>
<entry>
<published>2021-02-27T01:40:00.043-07:00</published>
<updated>2021-07-09T21:51:10.001-06:00</updated>
<title>Part III</title>
<author><name>Keith</name></author>
</entry>
<entry>
<published>2021-02-13T02:09:00.042-07:00</published>
<updated>2021-12-19T01:29:30.953-07:00</updated>
<title>Part II</title>
<author><name>Keith</name></author>
</entry>
<entry>
<published>2021-02-05T14:09:00.104-07:00</published>
<updated>2021-06-24T19:26:08.179-06:00</updated>
<title>Part I</title>
<author><name>Keith</name></author>
</entry>
</feed>
EOT

start_daemon(encode_utf8 $rss);

Jupiter::update_cache("test-$id/rss2sample.opml");

( run in 0.425 second using v1.01-cache-2.11-cpan-05444aca049 )