Alvis-NLPPlatform

 view release on metacpan or  search on metacpan

lib/Alvis/NLPPlatform.pm  view on Meta::CPAN

    token_id_is_in_list_refid_token($list_refid_token, $token_to_search);

The method returns 1 if the token C<$token_to_search> is in the list
C<$list_refid_token>, 0 else.


=head2 token_id_follows_list_refid_token()

    token_id_follows_list_refid_token($list_refid_token, $token_to_search);

The method returns 1 if the token C<$token_to_search> is the foollwing
of the last token of the list C<$list_refid_token>, 0 else.


=head2 token_id_just_before_last_of_list_refid_token()

    token_id_just_before_last_of_list_refid_token($list_refid_token, $token_to_search);

The method returns 1 if the token C<$token_to_search> is just before
the first token of the list C<$list_refid_token>, 0 else.

=head2 unparseable_id()

   unparseable_id($id)

The method checks if the id have been parsed or not. If not, it prints
a warning.

=head2 platform_reset()

   platform_reset()

The method empties or resets the structures and variables attached to
a processed document.

=head1 PLATFORM CONFIGURATION

The configuration file of the NLP Platform is composed of global
variables and divided into several sections:

=over 

=item * Global variables.

The two mandatory variables are C<ALVISTMP> and C<PRESERVEWHITESPACE>
 (in the XML_INPUT section). 


=over 8

=item * 

C<ALVISTMP> : it defines the temporary directory used during the
 annotation process. The files are recorded in (XML files and
 input/output of the NLP tools) during the annotation step.  It must
 be writable to the user the process is running as.

=item *

C<DEBUG> : this variable indicates if the NLP platform is run in a
debug mode or not. The value are 1 (debug mode) or 0 (no debug
mode). Default value is 0. The main consequence of the debug mode is
to keep the temporary file.

=back


Additional variables and environement variables can be used if they
are interpolated in the configuration file. For instance, in the
default configuration file, we add 

=over

=item *

C<PLATFORM_ROOT>: directory where are installed NLP tools and resources.



=item * 

C<NLP_tools_root>: root directory where are installed the NLP tools



=item *

C<AWK>: path for awk

=item *

C<SEMTAG_EN_DIR>: directory where is installed the semantic tagger

=item *

C<ONTOLOGY>: path for the ontology for the semanticTypeTagger (trish2
format -- see documentation of the semanticTypeTagger)

=item *

C<CANONICAL_DICT>: path for the dictionary with the canonical form of
the semantic units (trish2 format -- see documentation of the
semanticTypeTagger)

=item *

C<PARENT_DICT>:: path for the dictionary with the parent nodes of the
semantic units (trish2 format -- see documentation of the
semanticTypeTagger)

=back


=item * Section C<alvis_connection>

=over 8

=item * 

C<HARVESTER_PORT>: the port of the  harverster/crawler (C<combine>) that the platform will read from to get  the documents to annotate.

=item * 



( run in 1.585 second using v1.01-cache-2.11-cpan-5511b514fd6 )