Alvis-NLPPlatform

 view release on metacpan or  search on metacpan

examples/InputDocument.xml  view on Meta::CPAN

      </acquisitionData>
      <canonicalDocument>        
        <section>
          <list>
            <item>NAME</item> 
            <item>SYNOPSIS</item> 
            <item>DESCRIPTION</item> 
            <item>Linguistic annotation: requirements</item> 
            <item>METHODS</item> 
            <item>compute_dependencies()</item> 
            <item>starttimer()</item> 
            <item>endtimer()</item> 
            <item>linguistic_annotation()</item> 
            <item>standalone()</item> 
            <item>standalone_main()</item> 
            <item>client_main()</item> 
            <item>load_config()</item> 
            <item>client()</item> 
            <item>sigint_handler()</item> 
            <item>server()</item> 
            <item>disp_log()</item> 
            <item>split_to_docRecs()</item> 

examples/InputDocument.xml  view on Meta::CPAN

              <item>Term Tagging: this step requires tokenization, word and sentence segmentation, and Part-of-Speech tagging. Lemmatization is recommended to improve the term recognition.</item> 
              <item>Parsing: this step requires tokenization, word and sentence segmentation. Term tagging is recommended to improve the parsing of noun phrases.</item> 
              <item>Semantic feature tagging: To be determined</item> 
              <item>Semantic relation tagging: To be determined</item> 
              <item>Anaphora resolution: To be determined</item></list></section>
          <section title="METHODS">
            <section>METHODS</section>  
            <section title="compute_dependencies()">
              <section>compute_dependencies()</section> compute_dependencies($hashtable_config); 
              <section>This method processes the configuration variables defining the linguistic annotation steps. $hash_config is the reference to the hashtable containing the variables defined in the configuration file. The dependencies of the ling...
            <section title="starttimer()">
              <section>starttimer()</section> starttimer() 
              <section>This method records the current date and time. It is used to compute the time of a processing step.</section></section>
            <section title="endtimer()">
              <section>endtimer()</section> endtimer(); 
              <section>This method ends the timer and returns the time of a processing step, according to the time recorded by starttimer() .</section></section>
            <section title="linguistic_annotation()">
              <section>linguistic_annotation()</section> linguistic_annotation($h_config,$doc_hash); 
              <section>This methods carries out the lingsuitic annotation according to the list of required annotations. Required annotations are defined by the configuration variables ( $hash_config is the reference to the hashtable containing the v...
              <section>The document to annotate is passed as a hash table ( $doc_hash ). The method adds annotation to this hash table.</section></section>
            <section title="standalone()">
              <section>standalone()</section> standalone($config, $HOSTNAME, $doc); 
              <section>This method is used to annotate a document in the standalone mode of the platform. The document $doc is given in the ALVIS XML format.</section> 
              <section>The reference to the hashtable $config contains the configuration variables. The variable $HOSTNAME is the host name.</section> 
              <section>The method returns the annotation document.</section></section>
            <section title="standalone_main()">

lib/Alvis/NLPPlatform.pm  view on Meta::CPAN

our @found_terms_smidx;
our @found_terms_phr;
our @found_terms_words;

my $phrase_idx;

my $id;

# Timer 

my $timer_mem;

# because those variables have to be viewed in the sigint handler !!!
my $nlp_host;
my $nlp_port;
my $connection_retry;

# ENVIRONMENT VARIABLES
my $NLPTOOLS;
my $ALVISTMP;
our $ALVISLOGFILE;

lib/Alvis/NLPPlatform.pm  view on Meta::CPAN

    print STDERR "LEMMA: "; if($ENABLE_LEMMA){print STDERR "Enabled\n";}else{print STDERR "Disabled\n";}
    print STDERR "TERM_TAGGING: "; if($ENABLE_TERM_TAG){print STDERR "Enabled\n";}else{print STDERR "Disabled\n";}
    print STDERR "SYNTAX: "; if($ENABLE_SYNTAX){print STDERR "Enabled\n";}else{print STDERR "Disabled\n";}
    print STDERR "SEMANTIC TAGGING: "; if($ENABLE_SEMANTIC_TAG){print STDERR "Enabled\n";}else{print STDERR "Disabled\n";}
    return;
}


###########################################################################

sub starttimer(){
    my $sec;
    my $usec;
    ($sec,$usec)=gettimeofday();
    $usec/=1000000;
    $timer_mem=($sec+$usec);
}



sub endtimer(){
    my $sec;
    my $usec;
    ($sec,$usec)=gettimeofday();
    $usec/=1000000;
    return (($sec+$usec)-$timer_mem);
}



sub linguistic_annotation {
    my $h_config = $_[0];
    my $doc_hash = $_[1];

    my $nb_max_tokens = 0;

    $Alvis::NLPPlatform::Annotation::phrase_idx = 1;
    $Alvis::NLPPlatform::Annotation::syntactic_relation_idx = 1;

    print STDERR "Working Language: " . $Alvis::NLPPlatform::Annotation::ALVISLANGUAGE . "\n";

    starttimer();
    if ($ENABLE_TOKEN) {
	
	# Tokenize
	Alvis::NLPPlatform::UserNLPWrappers->tokenize($h_config,$doc_hash);
	# print STDERR $Alvis::NLPPlatform::Annotation::nb_max_tokens. "\n";
	$time_tok+=endtimer();
	print STDERR "\tTokenization Time : $time_tok\n";
	push @{$doc_hash->{"log_processing0"}->{"comments"}},  "Tokenization Time : $time_tok";
	
	if ($Alvis::NLPPlatform::Annotation::nb_max_tokens >0) {
	    # Scan for NE
	    if($ENABLE_NER==1){
		starttimer();
		Alvis::NLPPlatform::UserNLPWrappers->scan_ne($h_config, $doc_hash);
		$time_ne+=endtimer();
		print STDERR "\tNamed Entity Recognition Time : $time_ne\n";
		push @{$doc_hash->{"log_processing0"}->{"comments"}},  "Named Entity Recognition Time : $time_ne";
	    }

	    # Word segmentation
	    if($ENABLE_WORD==1){
		starttimer();
		Alvis::NLPPlatform::UserNLPWrappers->word_segmentation($h_config, $doc_hash);
		$time_word+=endtimer();
		print STDERR "\tWord Segmentation Time : $time_word\n";
		push @{$doc_hash->{"log_processing0"}->{"comments"}},  "Word Segmentation Time : $time_word";
	    }

	    if($dont_annotate==1){
		print STDERR "Skipped document\n";
		undef %$doc_hash;
		%$doc_hash=();
		$doc_hash=0;
		push @tab_errors,"SKIPPED DOCUMENT\n";
		push @tab_errors,"URL: ".$Alvis::NLPPlatform::Annotation::documenturl."\n";
		push @tab_errors,"Language tag: ".$Alvis::NLPPlatform::Annotation::ALVISLANGUAGE."\n";
		push @tab_errors,"Temporary files can be found with the following prefix: $TMPFILE\n";
	    }

	    # Sentence segmentation
	    if($ENABLE_SENTENCE==1){
		starttimer();
		if(!$dont_annotate){Alvis::NLPPlatform::UserNLPWrappers->sentence_segmentation($h_config, $doc_hash)};
		$time_sent+=endtimer();
		print STDERR "\tSentence Segmentation Time : $time_sent\n";
		push @{$doc_hash->{"log_processing0"}->{"comments"}},  "Sentence Segmentation Time : $time_sent";
	    }

	    # PoS tagging / Lemmatization
	    if($ENABLE_POS==1){
		starttimer();
		if(!$dont_annotate){Alvis::NLPPlatform::UserNLPWrappers->pos_tag($h_config, $doc_hash)};
		$time_pos+=endtimer();
		print STDERR "\tPart of Speech Tagging Time : $time_pos\n";
		push @{$doc_hash->{"log_processing0"}->{"comments"}},  "Part of Speech Tagging Time : $time_pos";
	    }

	    # Term tagging
	    if($ENABLE_TERM_TAG==1){
		starttimer();
		if(!$dont_annotate){Alvis::NLPPlatform::UserNLPWrappers->term_tag($h_config, $doc_hash)};
		$time_term+=endtimer();
		print STDERR "\tTerm Tagging Time : $time_term\n";
		push @{$doc_hash->{"log_processing0"}->{"comments"}},  "Term Tagging Time : $time_term";
	    }

	    # Syntactic parsing
	    if($ENABLE_SYNTAX==1){
		starttimer();
		if(!$dont_annotate){Alvis::NLPPlatform::UserNLPWrappers->syntactic_parsing($h_config, $doc_hash)};
		$time_synt+=endtimer();
		print STDERR "\tSyntactic Parsing Time : $time_synt\n";
		push @{$doc_hash->{"log_processing0"}->{"comments"}},  "Syntactic Parsing Time : $time_synt";
	    }

	    # Semantic tagging
	    if($ENABLE_SEMANTIC_TAG==1){
		starttimer();
		if(!$dont_annotate){Alvis::NLPPlatform::UserNLPWrappers->semantic_feature_tagging($h_config, $doc_hash)};
		$time_semtag+=endtimer();
		print STDERR "\tSemantic Feature Tagging Time : $time_semtag\n";
		push @{$doc_hash->{"log_processing0"}->{"comments"}},  "Semantic Feature Tagging Time : $time_semtag";
	    }

	}	    
    }
}


###########################################################################

lib/Alvis/NLPPlatform.pm  view on Meta::CPAN

    @found_terms=();
    @found_terms_tidx=();
    @found_terms_smidx=();
    @found_terms_phr=();
    @found_terms_words=();

    $phrase_idx=1;

    @tab_errors=();

    starttimer();


#     $doc_xml =~ s/("<\?xml version=\"1.0\" encoding=\"$charset\"?>\n
    $doc_hash=Alvis::NLPPlatform::Annotation::load_xml($doc_xml, $h_config);
    $time_load+=endtimer();

    # Recording computing data (time and entity size)
    # init
    $doc_hash->{"log_processing0"}->{"datatype"}="log_processing";
    $doc_hash->{"log_processing0"}->{"log_id"} = "time";
    $doc_hash->{"log_processing1"}->{"datatype"}="log_processing";
    $doc_hash->{"log_processing1"}->{"log_id"} = "element_size";
    $doc_hash->{"log_processing2"}->{"datatype"}="log_processing";
    $doc_hash->{"log_processing2"}->{"log_id"} = "host";
    $doc_hash->{"log_processing2"}->{"comments"} = $HOSTNAME;

lib/Alvis/NLPPlatform.pm  view on Meta::CPAN

	print STDERR "done - documentRecord ".$Alvis::NLPPlatform::Annotation::document_record_id;
	print STDERR " (document $cur_doc_nb)\n";


	Alvis::NLPPlatform::linguistic_annotation($h_config, $doc_hash);

	# Save to XML file
	$cur_doc_nb++;
	print STDERR "Rendering XML...  ";

	starttimer();
	$time_render = 0;
	push @{$doc_hash->{"log_processing0"}->{"comments"}},  "XML rendering Time : \@RENDER_TIME_NOT_SET\@";
	Alvis::NLPPlatform::Annotation::render_xml($doc_hash, $descriptor, $printCollectionHeaderFooter, $h_config);
	$time_render+=endtimer();

# TODO : recording the xml rendering time

	# Recording statistical data (time and entity size)
	# XML rendering (unsuable)
	print STDERR "done\n";
	print STDERR "\tXML rendering Time : $time_render\n";
	
    }else{
	print STDERR "done parsing - no more documents.\n";

lib/Alvis/NLPPlatform.pm  view on Meta::CPAN

	
	close $sock;

	# restore the normal behaviour
	$SIG{'INT'} = \&sigint_handler;

	print STDERR "Processing $id";
	
	my $doc_hash;
    
	Alvis::NLPPlatform::starttimer();
	$doc_hash=Alvis::NLPPlatform::Annotation::load_xml($doc_xml, \%config);
	my $time_load+=Alvis::NLPPlatform::endtimer();

	# Recording computing data (time and entity size)
	# init
#     $doc_hash->{"log_processing"} = {};
	$doc_hash->{"log_processing0"}->{"datatype"}="log_processing";
	$doc_hash->{"log_processing0"}->{"log_id"} = "time";
	$doc_hash->{"log_processing1"}->{"datatype"}="log_processing";
	$doc_hash->{"log_processing1"}->{"log_id"} = "element_size";
	
    # Recording statistical data (time and entity size)

lib/Alvis/NLPPlatform.pm  view on Meta::CPAN

	print STDERR "Established connection to server.\n";
	
	print STDERR "Giving back annotated document...\n";
	# Communitation with the server
	print $sock "GIVEBACK\n$id\n";
	
	# Save to XML file

	print STDERR "\tRendering XML...  ";

	starttimer();
	$time_render = 0;
	push @{$doc_hash->{"log_processing0"}->{"comments"}},  "XML rendering Time : \@RENDER_TIME_NOT_SET\@";
	Alvis::NLPPlatform::Annotation::render_xml($doc_hash, $sock, 1,\%config);
	$time_render+=endtimer();

# TODO : recording the xml rendering time
	print STDERR "done\n";
    
	print $sock "<DONE>\n";
	
	print STDERR "done.\n";
	
	# the render time is sent

lib/Alvis/NLPPlatform.pm  view on Meta::CPAN

    compute_dependencies($hashtable_config);

This method processes the configuration variables defining the
linguistic annotation steps. C<$hash_config> is the
reference to the hashtable containing the variables defined in the
configuration file.  The dependencies of the linguistic
annotations are then coded. For instance, asking for POS annotation will
imply tokenization, word and sentence segmentations.


=head2 starttimer()

    starttimer()

This method records the current date and time. It is used to compute
the time of a processing step.



=head2 endtimer()

    endtimer();

This method ends the timer and returns the time of a processing step, according to the time recorded by C<starttimer()>.



=head2 linguistic_annotation()
    
    linguistic_annotation($h_config,$doc_hash);

This methods carries out the lingsuitic annotation according to the list
of required annotations. Required annotations are defined by the
configuration variables (C<$hash_config> is the

lib/Alvis/NLPPlatform/patches/link-4.1a-WithWhiteSpace.diff  view on Meta::CPAN

  }
  
+ int parse_options_get_whitespace(Parse_Options opts) {
+     return opts->whitespace;
+ }
+ 
+ void parse_options_set_whitespace(Parse_Options opts, int dummy) {
+     opts->whitespace = dummy;
+ }
+ 
  int parse_options_timer_expired(Parse_Options opts) {
      return resources_timer_expired(opts->resources);
  }
***************
*** 466,472 ****
  *
  ****************************************************************/
  
! Sentence sentence_create(char *input_string, Dictionary dict) {
      Sentence sent;
      int i;
  

lib/Alvis/NLPPlatform/patches/link-4.1b-WithWhiteSpace.diff  view on Meta::CPAN

  }
  
+ int parse_options_get_whitespace(Parse_Options opts) {
+     return opts->whitespace;
+ }
+ 
+ void parse_options_set_whitespace(Parse_Options opts, int dummy) {
+     opts->whitespace = dummy;
+ }
+ 
  int parse_options_timer_expired(Parse_Options opts) {
      return resources_timer_expired(opts->resources);
  }
***************
*** 466,472 ****
  *
  ****************************************************************/
  
! Sentence sentence_create(char *input_string, Dictionary dict) {
      Sentence sent;
      int i;
  



( run in 1.092 second using v1.01-cache-2.11-cpan-49f99fa48dc )