Alvis-NLPPlatform
view release on metacpan or search on metacpan
examples/InputDocument.xml view on Meta::CPAN
<section>We integrated TagEn as the default named entity tagger.</section>
<section>Form:
<section>sources, binaries and Perl scripts</section></section>
<section>Obtain:
<section><ulink url="http://www-lipn.univ-paris13.fr/~hamon/ALVIS/Tools/TagEN.tar.gz">http://www-lipn.univ-paris13.fr/~hamon/ALVIS/Tools/TagEN.tar.gz</ulink></section></section>
<section>Install: untar TagEN.tar.gz in a directory go to src directory run compile script</section>
<section>Licence:
<section>GPL</section></section>
<section>Version number required:
<section>any</section></section>
<section>Additional information:
<section>This named entity tagger can be run according to various mode. A mode is defined by Unitex (http://www-igm.univ-mlv.fr/~unitex/) graphs. The tagger can be used for English and French texts.</section></section></section>
<section title="Word and sentence segmenter">
<section>Word and sentence segmenter</section>
<section>The Word and sentence segmenter we use by default is a awk script sent by Gregory Grefenstette on the Corpora mailing list. We modified it to segmentize French texts.</section>
<list>
<item>Form: AWK script</item>
<item>Obtain: <ulink url="http://www-lipn.univ-paris13.fr/~hamon/ALVIS/Tools/WordSeg.tar.gz">http://www-lipn.univ-paris13.fr/~hamon/ALVIS/Tools/WordSeg.tar.gz</ulink></item>
<item>Install: untar WordSeg.tar.gz in a directory</item>
<item>Licence: GPL</item>
<item>Version number required: any (modifications for French by Paris 13)</item></list></section>
<section title="Part-of-Speech Tagger">
<section>Part-of-Speech Tagger</section>
<section>The default wrapper call the TreeTagger. This tool is a Part-of-Speech tagger and lemmatizer.</section>
<section>Form: binary+resources</section>
<section>Obtain: links and instructions at <ulink url="http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/DecisionTreeTagger.html">http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/DecisionTreeTagger.html</ulink></sec...
<section>Install: Information are given on the web site. To summarize, you need to:
<section>
<section>make a directory named, for instance, TreeTagger</section></section>
<section>
<section>Download archives in tools/TreeTagger</section></section>
<section>
<section>go in the directory tools/TreeTagger</section></section>
<section>
<section>Run install-tagger.sh</section></section></section>
<section>Licence: free for research only</section>
<section>Version number required: (by date) 62= 09.04.1996</section></section>
<section title="Term Tagger">
<section>Term Tagger</section>
<section>We have integrated a tool developed specifically for the Alvis project.It is required while installing the platform.</section>
<list>
<item>Form: Perl module</item>
<item>Obtain: On CPAN, <ulink url="http://search.cpan.org/~thhamon/Alvis-TermTagger-0.3/">http://search.cpan.org/~thhamon/Alvis-TermTagger-0.3/</ulink></item>
<item>Install: perl Makefile.PL make make install</item>
<item>Licence: GPL</item>
<item>Version number required: any</item></list></section>
<section title="Part-of-Speech specialized for Biological texts">
<section>Part-of-Speech specialized for Biological texts</section>
<section>GeniaTagger (POS and lemma tagger):</section>
<list>
<item>Form: source+resources</item>
<item>Obtain: links and instructions at <ulink url="http://www-tsujii.is.s.u-tokyo.ac.jp/~genia/postagger/geniatagger-2.0.1.tar.gz">http://www-tsujii.is.s.u-tokyo.ac.jp/~genia/postagger/geniatagger-2.0.1.tar.gz</ulink></item>
<item>Install: untar geniatagger-2.0.1.tar.gz in a directory cd tools/geniatagger-2.0.1 Run make</item>
<item>Licence: free for research only (and Wordnet licence for the dictionary)</item>
<item>Version number required: 2.0.1</item></list></section>
<section title="Parser">
<section>Parser</section>
<section>Link Grammar Parser:</section>
<list>
<item>Form: sources + resources</item>
<item>Obtain: <ulink url="http://www.link.cs.cmu.edu/link/ftp-site/link-grammar/link-4.1b/unix/link-4.1b.tar.gz">http://www.link.cs.cmu.edu/link/ftp-site/link-grammar/link-4.1b/unix/link-4.1b.tar.gz</ulink></item>
<item>Install: untar link-4.1b.tar.gz See the Makefile for configuration run make Apply the additional patch for the Link Grammar parser (lib/Alvis/NLPPlatform/patches). cd link-4.1b patch -p0 60 lib/Alvis/NLPPlatform/patches/link-4.1...
<item>Licence: Compatible with GPL</item>
<item>Version number required: 4.1a or 4.1b</item></list></section>
<section title="Parser specialized for biological texts">
<section>Parser specialized for biological texts</section>
<section>BioLG:</section>
<section>Form: sources + resources</section>
<section>Obtain: <ulink url="http://www.it.utu.fi/biolg/">http://www.it.utu.fi/biolg/</ulink></section>
<section>Install: untar See the Makefile for configuration run make</section>
<section>Licence: Compatible with GPL</section>
<section>Version number required: 1.1.11</section>
<section>additional programs</section>
<section># =head2 Semantic Tagger</section>
<section># SemanticTypeTagger:</section>
<section># =over</section>
<section># =item * Form:</section>
<section># sources + resources</section>
<section># =item * Obtain:</section>
<section># =item * Install:</section>
<section># (see the README in the archive)</section> # untar # run make check
<section># rn make</section>
<section># =item * Licence:</section>
<section># Compatible with GPL</section>
<section># =item * Version number required:</section>
<section># 0.4</section>
<section># =back</section></section></section>
<section title="TUNING THE NLP PLATFORM">
<section>TUNING THE NLP PLATFORM</section>
<section>The main characteristic of the NLP platform is its tunability according to the domain (language specificity of the documents to be annotated) and the user requirements. The tuning can be done at two levels:</section>
<section>either resources adapted or describing more precisely the domain can be exploited.
<section>In that respect, tuning concerns the integration of these resources in the NLP tools used in the plaform. The command line in the configuration file can be modified.</section>
<section>Example of resource switching can be found at the named entity recognition step. The default Named Entity tagger can use either bio-medical resources, or more general, according to the value of the parameter -t .</section></sec...
<section>
<section>either other NLP tools can be integrated in the NLP platform.</section>
<section>In that case, new wrappers should be written. To make easier, the integration of a new NLP tools, we used the polymorphism to override default wrappers. NLP platform package is defined as a three level hierarchy. The top is the...
<section>NB: If the package Alvis::NLPPlatform::UserNLPWrappers is not writable to the user, the tuning can be done by copying the Alvis::NLPPlatform::UserNLPWrappers in a local directory, and by adding this local directory to the PERL5...
<section>NB: A template for the package Alvis::NLPPlatform::UserNLPWrappers can be found in Alvis::NLPPlatform::UserNLPWrappers-template .</section>
<section>Example of such tuning can be fouond at the parsing level. We integrate a parser designed for biological documents in Alvis::NLPPlatform::UserNLPWrappers .</section></section></section>
<section title="PROTOCOL">
<section>PROTOCOL</section>
<list>
<item>Requesting a document:</item>
<item>from the client, to the server :</item>
<item>REQUEST</item></list> from the server, to the client :
<list>
<item>SENDING id ( id is the document id)</item>
<item>SIZE size ( size is the document size)</item>
<item>document ( document is the XML document)</item>
<item>60 DONE 62</item></list> from the client, to the server :
<list>
<item>ACK</item></list> Returning a document:
<list>
<item>from the client, to the server :</item>
<item>GIVEBACK</item>
<item>id ( id is the document id)</item>
<item>document ( document is the annotated document)</item>
<item>60 DONE 62</item></list> from the server, to the client :
<list>
<item>ACK</item></list> Aborting the annotation process:
<list>
<item>from the client, to the server :</item></list>
<section>ABORTING</section>
<section>id ( id is the document id)</section> Exiting:
<section>the server understands the following messages QUIT , LOGOUT and EXIT . However, this is not been implemented in the client yet.</section></section>
<section title="SEE ALSO">
<section>SEE ALSO</section>
<section>Alvis web site: <ulink url="http://www.alvis.info">http://www.alvis.info</ulink></section>
<section>Description of the input/output format: <ulink url="http://www.alvis.info/alvis/Architecture_2fFormats?action=show&redirect=architecture%2Fformats#documents">http://www.alvis.info/alvis/Architecture_2fFormats?action=show&...
<section title="AUTHORS">
<section>AUTHORS</section>
<section>Thierry Hamon 60 <ulink url="mailto:thierry.hamon@lipn.univ-paris13.fr">thierry.hamon@lipn.univ-paris13.fr</ulink>62 and Julien Deriviere 60 <ulink url="mailto:julien.deriviere@lipn.univ-paris13.fr">julien.deriviere@lipn.univ-par...
<section title="LICENSE">
<section>LICENSE</section>
<section>Copyright (C) 2005 by Thierry Hamon and Julien Deriviere</section>
<section>This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.6 or, at your option, any later version of Perl 5 you may have available.</section></section></s...
<metaData>
<meta name="title">title</meta>
</metaData>
<links>
<outlinks>
<link type="a">
<anchorText>http://www.link.cs.cmu.edu/link/ftp-site/link-grammar/link-4.1b/unix/link-4.1b.tar.gz</anchorText>
<location>http://www.link.cs.cmu.edu/link/ftp-site/link-grammar/link-4.1b/unix/link-4.1b.tar.gz</location>
</link>
<link type="a">
<anchorText>http://www-tsujii.is.s.u-tokyo.ac.jp/~genia/postagger/geniatagger-2.0.1.tar.gz</anchorText>
<location>http://www-tsujii.is.s.u-tokyo.ac.jp/~genia/postagger/geniatagger-2.0.1.tar.gz</location>
</link>
<link type="a">
<anchorText>http://www.alvis.info</anchorText>
<location>http://www.alvis.info</location>
</link>
<link type="a">
<anchorText>http://www.it.utu.fi/biolg/</anchorText>
<location>http://www.it.utu.fi/biolg/</location>
</link>
<link type="a">
<anchorText>thierry.hamon@lipn.univ-paris13.fr</anchorText>
<location>mailto:thierry.hamon@lipn.univ-paris13.fr</location>
</link>
<link type="a">
<anchorText>http://www.alvis.info/alvis/Architecture_2fFormats?action=show&amp;redirect=architecture%2Fformats#documents</anchorText>
<location>http://www.alvis.info/alvis/Architecture_2fFormats?action=show&amp;redirect=architecture%2Fformats#documents</location>
</link>
<link type="a">
<anchorText>http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/DecisionTreeTagger.html</anchorText>
<location>http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/DecisionTreeTagger.html</location>
</link>
<link type="a">
<anchorText>http://www-lipn.univ-paris13.fr/~hamon/ALVIS/Tools/TagEN.tar.gz</anchorText>
<location>http://www-lipn.univ-paris13.fr/~hamon/ALVIS/Tools/TagEN.tar.gz</location>
</link>
<link type="a">
<anchorText>http://search.cpan.org/~thhamon/Alvis-TermTagger-0.3/</anchorText>
<location>http://search.cpan.org/~thhamon/Alvis-TermTagger-0.3/</location>
</link>
<link type="a">
<anchorText>julien.deriviere@lipn.univ-paris13.fr</anchorText>
<location>mailto:julien.deriviere@lipn.univ-paris13.fr</location>
</link>
<link type="a">
<anchorText>http://www-lipn.univ-paris13.fr/~hamon/ALVIS/Tools/WordSeg.tar.gz</anchorText>
<location>http://www-lipn.univ-paris13.fr/~hamon/ALVIS/Tools/WordSeg.tar.gz</location>
</link>
</outlinks>
</links>
<analysis>
<property name="language">en</property>
</analysis>
</acquisition>
</documentRecord>
</documentCollection>
( run in 1.307 second using v1.01-cache-2.11-cpan-39bf76dae61 )