Alvis-NLPPlatform

 view release on metacpan or  search on metacpan

examples/InputDocument.xml  view on Meta::CPAN

            <section title="standalone_main()">
              <section>standalone_main()</section> standalone_main($hash_config, $doc_xml, \*STDOUT); 
              <section>This method is used to annotate a document in the standalone mode of the platform. The document ( %doc_xml ) is given in the ALVIS XML format.</section> 
              <section>The document is loaded into memory and then annotated according to the steps defined in the configuration variables ( $hash_config is the reference to the hashtable containing the variables defined in the configuration file). T...
              <section>The function returns the time of the XML rendering.</section></section>
            <section title="client_main()">
              <section>client_main()</section> client_main($doc_hash, $r_config); 
              <section>This method is used to annotate a document in the distributed mode of the NLP platform. The document given in the ALVIS XML format is already is loaded into memory ( $doc_hash ).</section> 
              <section>The document is annotated according to the steps defined in the configuration variables. The annotated document is returned to the calling method.</section></section>
            <section title="load_config()">
              <section>load_config()</section> load_config($rcfile); 
              <section>The method loads the configuration of the NLP Platform by reading the configuration file given in argument.</section></section>
            <section title="client()">
              <section>client()</section> client($rcfile) 
              <section>This is the main method for the client process. $rcfile is the file name containing the configuration.</section></section>
            <section title="sigint_handler()">
              <section>sigint_handler()</section> sigint_handler($signal); 
              <section>This method is used to catch the INT signal and send a ABORTING message to the server.</section></section>
            <section title="server()">
              <section>server()</section> server($rcfile) 
              <section>This is the main method for the server process. $rcfile is the file name containing the configuration.</section></section>
            <section title="disp_log()">
              <section>disp_log()</section> disp_log($hostname,$message); 
              <section>This method prints the message ( $message ) on the standard error output, in a formatted way:</section> 
              <section>date: (client=hostname) message</section></section>
            <section title="split_to_docRecs()">
              <section>split_to_docRecs()</section> split_to_docRecs($xml_docs); 
              <section>This method splits a list of documents into a table and return it. Each element of the table is a two element table containing the document id and the document.</section></section>
            <section title="sub_dir_from_id()">
              <section>sub_dir_from_id()</section> sub_dir_from_id($doc_id) 
              <section>Ths method returns the subdirectory where annotated document will stored. It computes the subdirectory from the two first characters of the document id ( $doc_id ).</section></section>
            <section title="record_id()">
              <section>record_id()</section> record_id($doc_id, $r_config); 
              <section>This method records in the file $ALVISTMP/.proc_id , the id of the document that has been sent to the client.</section></section>
            <section title="delete_id()">
              <section>delete_id()</section> delete_id($doc_id,$r_config); 
              <section>This method delete the id of the document that has been sent to the client, from the file $ALVISTMP/.proc_id .</section></section>
            <section title="init_server()">
              <section>init_server()</section> init_server($r_config); 
              <section>This method initializes the server. It reads the document id from the file $ALVISTMP/.proc_id and loads the corresponding documents i.e. documents which have been annotated but not recorded due to a server crash.</section></sec...
            <section title="token_id_is_in_list_refid_token()">
              <section>token_id_is_in_list_refid_token()</section> token_id_is_in_list_refid_token($list_refid_token, $token_to_search); 
              <section>The method returns 1 if the token $token_to_search is in the list $list_refid_token , 0 else.</section></section>
            <section title="token_id_follows_list_refid_token()">
              <section>token_id_follows_list_refid_token()</section> token_id_follows_list_refid_token($list_refid_token, $token_to_search); 
              <section>The method returns 1 if the token $token_to_search is the foollwing of the last token of the list $list_refid_token , 0 else.</section></section>
            <section title="token_id_just_before_last_of_list_refid_token()">
              <section>token_id_just_before_last_of_list_refid_token()</section> token_id_just_before_last_of_list_refid_token($list_refid_token, $token_to_search); 
              <section>The method returns 1 if the token $token_to_search is just before the first token of the list $list_refid_token , 0 else.</section></section>
            <section title="unparseable_id()">
              <section>unparseable_id()</section> unparseable_id($id) 
              <section>The method checks if the id have been parsed or not. If not, it prints a warning.</section></section></section>
          <section title="PLATFORM CONFIGURATION">
            <section>PLATFORM CONFIGURATION</section> 
            <section>The configuration file of the NLP Platform is composed of global variables and divided into several sections:</section>  
            <section>Global variables. 
              <section>The two mandatory variables are ALVISTMP and PRESERVEWHITESPACE (in the XML_INPUT section).</section>  
              <section>
                <section>ALVISTMP : it defines the temporary directory used during the annotation process. The files are recorded in (XML files and input/output of the NLP tools) during the annotation step. It must be writable to the user the process...
              <section>
                <section>DEBUG : this variable indicates if the NLP platform is run in a debug mode or not. The value are 1 (debug mode) or 0 (no debug mode). Default value is 0. The main consequence of the debug mode is to keep the temporary file.</...
              <section>Additional variables and environement variables can be used if they are interpolated in the configuration file. For instance, in the default configuration file, we add</section>  
              <section>
                <section>PLATFORM_ROOT : directory where are installed NLP tools and resources.</section></section> 
              <section>
                <section>NLP_tools_root : root directory where are installed the NLP tools</section></section> 
              <section>
                <section>AWK : path for awk</section></section> 
              <section>
                <section>SEMTAG_EN_DIR : directory where is installed the semantic tagger</section></section> 
              <section>
                <section>ONTOLOGY : path for the ontology for the semanticTypeTagger (trish2 format -- see documentation of the semanticTypeTagger)</section></section> 
              <section>
                <section>CANONICAL_DICT : path for the dictionary with the canonical form of the semantic units (trish2 format -- see documentation of the semanticTypeTagger)</section></section> 
              <section>
                <section>PARENT_DICT :: path for the dictionary with the parent nodes of the semantic units (trish2 format -- see documentation of the semanticTypeTagger)</section></section></section> 
            <section>Section alvis_connection  
              <section>
                <section>HARVESTER_PORT : the port of the harverster/crawler ( combine ) that the platform will read from to get the documents to annotate.</section></section> 
              <section>
                <section>NEXTSTEP : indicates if there is a next step in the pipeline (for instance, the indexer IdZebra). The value is 0 or 1 .</section></section> 
              <section>
                <section>NEXTSTEP_HOST : the host name of the component that the platform will send the annotated document to.</section></section> 
              <section>
                <section>NEXTSTEP_PORT : the port of the component that the platform will send the annotated document to.</section></section> 
              <section>
                <section>SPOOLDIR : the directory where the documents coming from the harvester are stored.</section> 
                <section>It must be writable to the user the process is running as.</section></section> 
              <section>
                <section>OUTDIR : the directory where are stored the annotated documents if SAVE_IN_OUTDIR (in Section NLP_misc ) is set.</section> 
                <section>It must be writable to the user the process is running as.</section></section></section> 
            <section>Section NLP_connection  
              <section>
                <section>SERVER : The host name where the NLP server is running, for the connections with the NLP clients.</section></section> 
              <section>
                <section>PORT : The listening port of the NLP server, for the connections with the NLP clients.</section></section> 
              <section>
                <section>RETRY_CONNECTION : The number of times that the clients attempts to connect to the server.</section></section></section> 
            <section>XML_INPUT  
              <section>
                <section>PRESERVEWHITESPACE is a boolean indicating if the linguistic annotation will be done by preserving white space or not, i.e. XML blank nodes and white space at the beginning and the end of any line, but also indentation of the...
                <section>Default value is 0 or false (blank nodes and indentation characters are removed).</section></section> 
              <section>
                <section>LINGUISTIC_ANNOTATION_LOADING : The linguistic annotations already existing in the input documents are loaded or not. Default value is c60162 or true (linguistic annotations are loaded).</section></section></section> 
            <section>
              <section>XML_OUTPUT (Not available yet)</section>   
              <section>
                <section>FORM</section></section>  
              <section>
                <section>ID</section></section></section> 
            <section>Section linguistic_annotation 
              <section>the section defines the NLP steps that will be used for annotating documents. The values are 0 or 1 .</section>  
              <section>
                <section>ENABLE_TOKEN : toggles the tokenization step.</section></section> 
              <section>
                <section>ENABLE_NER : toggles the named entity recognition step.</section></section> 
              <section>
                <section>ENABLE_WORD : toogles the word segmentation step.</section></section> 
              <section>
                <section>ENABLE_SENTENCE : toogles the sentence segmentation step.</section></section> 
              <section>



( run in 0.650 second using v1.01-cache-2.11-cpan-8644d7adfcd )