Syndication-NITF

 view release on metacpan or  search on metacpan

NITF.pm  view on Meta::CPAN

 use Syndication::NITF;

 my $nitf = new Syndication::NITF("myNITFfile.xml");
 my $head = $nitf->gethead;

 my $title = $head->gettitle->getText;

 my $tobject = $head->gettobject;
 if ($tobject->gettobjecttype eq "news") {
   my $items = $tobject->gettobjectsubjectList;
   foreach my $item (@$items) {
     # process each subject header
     ...
   }
 }
 ... etc ...

=head1 DESCRIPTION

B<Syndication::NITF> is an object-oriented Perl interface to NITF documents, allowing
you to manage (and one day create) NITF documents without any specialised NITF
or XML knowledge.

NITF is a standard format for the markup of textual news content (eg newspaper and
magazine articles), ratified by the International Press Telecommunications
Council (http://www.iptc.org).

This module supports the version 3.0 DTD of NITF. It makes no attempt to support eariler
versions of the DTD.

The module code is based on my B<Syndication::NewsML> module, and much of the functionality
is shared between the two (well actually it's copied from the NewsML module rather than
"shared" properly in the form of a separate module of shared classes -- this may be remedied
in the future).

=head2 Initialization

At the moment the constructor can only take a filename as an argument, as follows:

  my $nitf = new Syndication::NITF("file-to-parse.xml");

This attaches a parser to the file (using XML::DOM), and returns a reference to the first NITF
tag. (I may decide that this is a bad idea and change it soon)

=head2 Reading objects

There are five main types of calls:

=over 4

=item *

Get an individual element:

  my $head = $nitf->gethead;

=item *

Return a reference to an array of elements:

  my $identifiedcontentlist = $head->getdocdata->getidentifiedcontentList;

The array can be referenced as @$identifiedcontentlist, or an individual element can be
referenced as $identifiedcontentlist->[N].

=item *

Return the size of a list of elements:

  my $iclcount = $head->getdocdata->getidentifiedcontentCount;

=item *


Get an attribute of an element (as text):

  my $href = $catalog->getHref;

=item *

Get the contents of an element (ie the text between the opening and closing tags):

  my $urlnode = $catalog->getResourceList->[0]->getUrlList->[0];
  my $urltext = $urlnode->getText;

=back

Not all of these calls work for all elements: for example, if an element is defined in the NITF DTD
as having zero or one instances in its parent element, and you try to call getXXXList, B<Syndication::NITF>
will "croak" an error. (The error handling will be improved in the future so that it won't croak
fatally unless you want that to happen)

The NITF standard contains some "business rules" also written into the DTD: for example, a NewsItem
may contain nothing, a NewsComponent, one or more Update elements, or a TopicSet. For some of these
rules, the module is smart enough to detect errors and provide a warning. Again, these warnings will
be improved and extended in future versions of this module.

=head2 Documentation for all the classes

Each NITF element is represented as a class. This means that you can traverse documents as Perl
objects, as seen above.

Full documentation of which classes can be used in which documents is beyond me right now (with over
120 classes to document), so for now you'll have to work with the examples in the B<examples/> and
B<t/> directories to see what's going on. You should be able to get a handle on it fairly quickly.

The real problem is that it's hard to know when to use B<getXXX()> and when to use B<GetXXXList()>
-- that is, when an element can have more than one entry and when it is a singleton. Quite often it
isn't obvious from looking at a NITF document. For now, two ways to work this out are to try it and see
if you get an error, or to have a copy of the DTD in front of you. Obviously neither of these is
optimal, but documenting all 127 classes just so people can tell this difference is pretty scary as
well, and so much documentation would put lots of people off using the module. So I'll probably come
up with a reference document listing all the classes and methods, rather than docs for each class, in
a future release.  If anyone has any better ideas, please let me know.

=head1 BUGS

None that I know of, but there are probably many. The test suite isn't complete, so not every method
is tested, but the major ones (seem to) work fine. Of course, if you find bugs, I'd be very keen to
hear about them at B<brendan@clueful.com.au>. 

=head1 SEE ALSO

L<XML::DOM>, L<XML::RSS>, L<Syndication::NewsML>

=head1 AUTHOR

Brendan Quinn, Clueful Consulting Pty Ltd
(brendan@clueful.com.au)

NITF.pm  view on Meta::CPAN

#
# Syndication::NITF::BlockContentNode -- nodes that include marked up content
#
package Syndication::NITF::BlockContentNode;
use Carp;
@ISA = qw( Syndication::NITF::Node );

sub _init {
	my ($self, $node) = @_;
	$self->{_multiElements}->{p} = ZEROORMORE;
	$self->{_multiElements}->{hl2} = ZEROORMORE;
	$self->{_multiElements}->{table} = ZEROORMORE;
	$self->{_realname}->{nitftable} = "nitf-table";
	$self->{_multiElements}->{"nitf-table"} = ZEROORMORE;
	$self->{_multiElements}->{media} = ZEROORMORE;
	$self->{_multiElements}->{ol} = ZEROORMORE;
	$self->{_multiElements}->{uk} = ZEROORMORE;
	$self->{_multiElements}->{dl} = ZEROORMORE;
	$self->{_multiElements}->{bq} = ZEROORMORE;
	$self->{_multiElements}->{fn} = ZEROORMORE;
	$self->{_multiElements}->{note} = ZEROORMORE;
	$self->{_multiElements}->{pre} = ZEROORMORE;
	$self->{_multiElements}->{hr} = ZEROORMORE;
}

#
# Syndication::NITF::DateNode -- superclass defining an extra method for elements
#                             that contain ISO8601 formatted dates
# NEEDS TO BE CHANGED because most ISO8601 date "nodes" are actually attributes in NITF
package Syndication::NITF::DateNode;
use Carp;

# convert ISO8601 date/time into Perl internal date/time.
# always returns perl internal date, in UTC timezone.
sub getDatePerl {
	my ($self, $timezone) = @_;
	use Time::Local;
	my $dateISO8601 = $self->getText;
	my ($yyyy, $mm, $dd, $hh, $mi, $ss, $tzsign, $tzhh, $tzmi) = ($dateISO8601 =~ qr/(\d\d\d\d)(\d\d)(\d\d)T?(\d\d)?(\d\d)?(\d\d)?([+-])?(\d\d)?(\d\d)?/);
	my $perltime = timegm($ss, $mi, $hh, $dd, $mm-1, $yyyy);
	if ($tzhh) {
		my $deltasecs = 60 * ($tzsign eq "-") ? -1*($tzhh * 60 + $tzmi) : ($tzhh * 60 + $tzmi);
		$perltime += $deltasecs;
	}
	return $perltime;
}

#
# Syndication::NITF::head -- header of a document
#
package Syndication::NITF::head;
use Carp;
@ISA = qw( Syndication::NITF::GlobalAttributesNode );

sub _init {
	my ($self, $node) = @_;
	$self->{_singleElements}->{title} = OPTIONAL;
	$self->{_multiElements}->{meta} = ZEROORMORE;
	$self->{_singleElements}->{tobject} = OPTIONAL;
	$self->{_singleElements}->{iim} = OPTIONAL;
	$self->{_singleElements}->{docdata} = OPTIONAL;
	$self->{_multiElements}->{pubdata} = ZEROORMORE;
	$self->{_realname}->{revisionhistory} = "revision-history";
	$self->{_multiElements}->{"revision-history"} = ZEROORMORE;
}

#
# Syndication::NITF::title -- document title
#
package Syndication::NITF::title;
use Carp;
@ISA = qw( Syndication::NITF::GlobalAttributesNode );

sub _init {
	my ($self, $node) = @_;
	$self->{_hasText} = 1;
}

# attribute is an enumeration so we must handle separately
sub gettype { # type of title
	my ($self) = @_;
	my @possiblevalues = qw(main subtitle parttitle alternate abbrev other);
	my $attr = $self->{node}->getAttributeNode("type");
	$self->{"type"} = $attr ? $attr->getValue : "";
	if ($self->{type} && grep !/$self->{type}/, "@possiblevalues") {
		croak "Illegal value ".$self->{type}." for attribute type";
	}
	return $self->{type};
}

#
# Syndication::NITF::meta -- generic metadata
#
package Syndication::NITF::meta;
use Carp;
@ISA = qw( Syndication::NITF::GlobalAttributesNode );

sub _init {
	my ($self, $node) = @_;
	$self->{_realname}->{httpequiv} = "http-equiv";
	$self->{_attributes}->{"http-equiv"} = IMPLIED;  # HTTP response header name
	$self->{_attributes}->{name} = IMPLIED;  # Name of this piece of metadata
	$self->{_attributes}->{content} = REQUIRED;  # Name of this piece of metadata
}

#
# Syndication::NITF::tobject -- subject code
#
package Syndication::NITF::tobject;
use Carp;
@ISA = qw( Syndication::NITF::GlobalAttributesNode );

sub _init {
	my ($self, $node) = @_;
	$self->{_realname}->{tobjectproperty} = "tobject.property";
	$self->{_multiElements}->{"tobject.property"} = ZEROORMORE;
	$self->{_realname}->{tobjectsubject} = "tobject.subject";
	$self->{_multiElements}->{"tobject.subject"} = ZEROORMORE;
	$self->{_realname}->{tobjecttype} = "tobject.type";
	$self->{_attributes}->{"tobject.type"} = IMPLIED;
}

NITF.pm  view on Meta::CPAN


#
# Syndication::NITF::tobject.subject -- subject classification
#  we introduced a hack to handle this: these class names leave out the dot from the element name
#
package Syndication::NITF::tobjectsubject;
use Carp;
@ISA = qw( Syndication::NITF::GlobalAttributesNode );

sub _init {
	my ($self, $node) = @_;
	$self->{_realname}->{tobjectsubjectipr} = "tobject.subject.ipr";
	$self->{_attributes}->{"tobject.subject.ipr"} = IMPLIED;
	$self->{_realname}->{tobjectsubjectrefnum} = "tobject.subject.refnum";
	$self->{_attributes}->{"tobject.subject.refnum"} = REQUIRED;
	$self->{_realname}->{tobjectsubjectcode} = "tobject.subject.code";
	$self->{_attributes}->{"tobject.subject.code"} = IMPLIED;
	$self->{_realname}->{tobjectsubjecttype} = "tobject.subject.type";
	$self->{_attributes}->{"tobject.subject.type"} = IMPLIED;
	$self->{_realname}->{tobjectsubjectmatter} = "tobject.subject.matter";
	$self->{_attributes}->{"tobject.subject.matter"} = IMPLIED;
	$self->{_realname}->{tobjectsubjectdetail} = "tobject.subject.detail";
	$self->{_attributes}->{"tobject.subject.detail"} = IMPLIED;
}

# this attribute has a default so we have to handle it separately
sub gettobjectsubjectipr {
	my ($self) = @_;
	my $attr = $self->{node}->getAttributeNode("tobject.subject.ipr");
	$self->{"tobjectsubjectipr"} = $attr ? $attr->getValue : "IPTC";
}

#
# Syndication::NITF::iim -- IIM Record 2 Data Container
#
package Syndication::NITF::iim;
use Carp;
@ISA = qw( Syndication::NITF::GlobalAttributesNode );

sub _init {
	my ($self, $node) = @_;
	$self->{_multiElements}->{ds} = ZEROORMORE;
	$self->{_attributes}->{ver} = IMPLIED; # IIM version number
}

#
# Syndication::NITF::ds -- IIM Record 2 dataset information
#
package Syndication::NITF::ds;
use Carp;
@ISA = qw( Syndication::NITF::GlobalAttributesNode );

sub _init {
	my ($self, $node) = @_;
	$self->{_multiElements}->{ds} = ZEROORMORE;
	$self->{_attributes}->{num} = REQUIRED; # IIM field number
	$self->{_attributes}->{value} = IMPLIED; # IIM field value
}

#
# Syndication::NITF::docdata -- Document metadata
#
package Syndication::NITF::docdata;
use Carp;
@ISA = qw( Syndication::NITF::GlobalAttributesNode );

sub _init {
	my ($self, $node) = @_;
	$self->{_multiElements}->{correction} = ZEROORMORE;
	$self->{_multiElements}->{evloc} = ZEROORMORE;
	$self->{_realname}->{docid} = "doc-id";
	$self->{_multiElements}->{"doc-id"} = ZEROORMORE;
	$self->{_realname}->{dellist} = "del-list";
	$self->{_multiElements}->{"del-list"} = ZEROORMORE;
	$self->{_multiElements}->{urgency} = ZEROORMORE;
	$self->{_multiElements}->{fixture} = ZEROORMORE;
	$self->{_realname}->{dateissue} = "date.issue";
	$self->{_multiElements}->{"date.issue"} = ZEROORMORE;
	$self->{_realname}->{daterelease} = "date.release";
	$self->{_multiElements}->{"date.release"} = ZEROORMORE;
	$self->{_realname}->{dateexpire} = "date.expire";
	$self->{_multiElements}->{"date.expire"} = ZEROORMORE;
	$self->{_realname}->{docscope} = "doc-scope";
	$self->{_multiElements}->{"doc-scope"} = ZEROORMORE;
	$self->{_multiElements}->{series} = ZEROORMORE;
	$self->{_realname}->{edmsg} = "ed-msg";
	$self->{_multiElements}->{"ed-msg"} = ZEROORMORE;
	$self->{_realname}->{dukey} = "du-key";
	$self->{_multiElements}->{"du-key"} = ZEROORMORE;
	$self->{_realname}->{doccopyright} = "doc.copyright";
	$self->{_multiElements}->{"doc.copyright"} = ZEROORMORE;
	$self->{_realname}->{docrights} = "doc.rights";
	$self->{_multiElements}->{"doc.rights"} = ZEROORMORE;
	$self->{_realname}->{keylist} = "key-list";
	$self->{_multiElements}->{"key-list"} = ZEROORMORE;
	$self->{_realname}->{identifiedcontent} = "identified-content";
	$self->{_multiElements}->{"identified-content"} = ZEROORMORE;
}

#
# Syndication::NITF::correction -- Correction information
#
package Syndication::NITF::correction;
use Carp;
@ISA = qw( Syndication::NITF::GlobalAttributesNode );

sub _init {
	my ($self, $node) = @_;
	$self->{_attributes}->{info} = IMPLIED; # Message or instructions
	$self->{_realname}->{idstring} = "id-string";
	$self->{_attributes}->{"id-string"} = IMPLIED; # Document ID string
	$self->{_attributes}->{regsrc} = IMPLIED; # Identifies source of correction
}

#
# Syndication::NITF::evloc -- Event location (where an event took place, not where story was written)
#
package Syndication::NITF::evloc;
use Carp;
@ISA = qw( Syndication::NITF::GlobalAttributesNode );

sub _init {
	my ($self, $node) = @_;



( run in 0.888 second using v1.01-cache-2.11-cpan-39bf76dae61 )