Alvis-NLPPlatform

 view release on metacpan or  search on metacpan

lib/Alvis/NLPPlatform/NLPWrappers.pm  view on Meta::CPAN

UMLS (http://umlsinfo.nlm.nih.gov/ ). They can also be acquired through
corpus analysis.


The term matching in the document is carried out according to
typographical and inflectional variations. 
The typographical variation requires a slight preprocessing of the
terms.

We first assume a less strict use of the dash character. For instance,
the term I<UDP-glucose> can appear in the documents as I<UDP glucose>
and vice versa.  The inflectional variation requires a lemmatization
of the input documents. It makes it possible to identify
I<transcription factors> from I<transcription factor>.  Both variation
types can be taken into account altogether or separately during the
term matching.  Previous annotation levels, such as lemmatisation and
word segmentation but also named entities, are required.

C<$hash_config> is the reference to the hashtable containing the
variables defined in the configuration file.

Canonical forms and semantic tags which can be provided with the term
tagger and associated to the terms are taken into account. Canonical
forms are associated to the terms. Semantic tags are added at the
semantic features level. Semantic tags can be considered as a path in
a ontology. Each dot or slash characters are considered as a separator
of the node identifiers.

=head2 syntactic_parsing()

    syntactic_parsing($h_config, $doc_hash);

This method wraps the default sentence parsing. It aims at exhibiting
the graph of the syntactic dependency relations between the words of
the sentence. C<$doc_hash> is the hashtable containing containing all
the annotations of the input document.

C<$hash_config> is the reference to the hashtable containing the
variables defined in the configuration file.

The Link Grammar Parser (Daniel D. Sleator and Davy Temperley.
I<Parsing {E}nglish with a link grammar>. Third International Workshop
on Parsing Technologies. 1993. http://www.link.cs.cmu.edu/link/ ) is
actually integrated.


Processing time is a critical point for syntactic parsing, but we
expect that a good recognition of the terms can reduce significantly
the number of possible parses and consequently the parsing processing
time.  Term identification is therefore performed prior to parsing.
The word level of annotation is required. Depending on the choice of
the parser, the morphosyntactic level may be needed. 




=head2 semantic_feature_tagging()

    semantic_feature_tagging($h_config, $doc_hash)

The semantic typing function attaches a semantic type to the words,
terms and named-entities (referred to as lexical items in the
following) in documents according to the conceptual hierarchies of the
ontology of the domain. C<$doc_hash> is the hashtable containing
containing all the annotations of the input document.

C<$hash_config> is the
reference to the hashtable containing the variables defined in the
configuration file.

Currently, this step is not integrated in the platform.


=head2 semantic_relation_tagging()

    semantic_relation_tagging($h_config, $doc_hash)


This method wraps the semantic relation identification
step. C<$doc_hash> is the hashtable containing containing all the
annotations of the input document. In the Alvis project, the default
behaviour is the identification of domain specific semantic relations,
i.e. relations occurring between instances of the ontological concepts
in the document. These instances are identified and tagged accordingly
by the semantic typing. As a result, these semantic relation
annotations give another level of semantic representation of the
document that makes explicit the role that these semantic units
(usually named-entities and/or terms) play with respect to each other,
pertaining to the ontology of the domain.  However, this annotation
depends on previous document annotations and two different tagging
strategies, depending on the two different processing lines
(annotation of web documents and acquisition of resources used at the
web document annotation process) that impact the implementation of the
semantic relation tagging:

=over 

=item * If the document is syntactically parsed, the method can
exploit this information to tag relations mentioned explicitly. This
is achieved through the pattern matching of information extraction
rules. The rule matcher
that exploits them. The semantic relation tagger is therefore a mere
wrapper for the inference method.

=item * In the case where the document is not syntactically parsed,
the method will base its tagging on relations given by the ontology,
that is to say all known relations holding between semantic units
described in the document will be added, whether those relations be
explicitly mentioned in the document or not.

=back

C<$hash_config> is the
reference to the hashtable containing the variables defined in the
configuration file.

Currently, this step is not integrated in the platform.


=head2 anaphora_resolution()



( run in 1.659 second using v1.01-cache-2.11-cpan-e1769b4cff6 )