Alt-CWB-ambs
    
    
  
  
  
view release on metacpan or search on metacpan
lib/CWB/CEQL/Parser.pm view on Meta::CPAN
so no substitutions can be made).
=cut
sub encodeEntities {
  my ($self, $s) = @_;
  my %entity = ( '<' => '<', '>' => '>', '&' => '&', '"' => '"' );
  $s =~ s/([<>&"])/$entity{$1}/ge;  # unsafe characters => entities
  $s =~ s/[ \t]+/ /g;               # normalise whitespace (but not line breaks)
  $s =~ s/[\x00-\x09\x0b\x0c\x0e-\x1f]+//g; # remove other control characters except LF and CR
  if (Encode::is_utf8($s)) {
    $s =~ s/([^\x00-\x7f])/sprintf "&#x%X;", ord($1)/ge;
  }
  return $s;
}
=back
=head2 Internal structure of CWB::CEQL::Parser objects
A DPP parser object (i.e. an object that belongs to B<CWB::CEQL::Parser> or
lib/CWB/CEQL/String.pm view on Meta::CPAN
  print "42 $op 0\n"; # prints "42 >= 0"
  if ($op->type eq "Operator") { ... }
  $string = new CWB::CEQL::String "my string", "String";
  $string .= " is beautiful";       # changes string, but not its type
  $string->value("another string"); # $string = "..."; would replace with ordinary string
  print $string->value, "\n";       # access string value explicitly
  $string->attribute("charset", "ascii"); # declare and/or set user-defined attribute
  if ($string->attribute("charset") eq "utf8") { ... }
  $new_string = $string->copy;      # $new_string = $string; would point to same object
=head1
=head1 DESCRIPTION
B<** TODO **>
Note: automatic conversion to number in numerical expression does usually not work -- use value() method explicitly in this case
lib/CWB/Encoder.pm view on Meta::CPAN
sub info {
  my ($self, $info) = @_;
  $self->{INFO} = $info;
}
=item $enc->charset($code);
Set corpus character set (as a corpus property in the registry entry).
So far, only C<latin1> is fully supported. Other valid character sets are
C<latin2>, ..., C<latin9>, and C<utf8> (which will be supported by future
releases of the CWB). Any other I<$code> will raise a warning.
=cut
sub charset {
  my ($self, $charset) = @_;
  carp "CWB::Encoder: character set $charset not supported by CWB (latin1, ..., latin9, utf8).\n"
    unless $charset =~ /^(latin[1-9]|utf8)$/;
  $self->{CHARSET} = $charset;
}
=item $enc->language($code);
Set corpus language (as an informational corpus property in the
registry entry). Use of a two-letter ISO code (C<de>, C<en>, C<fr>,
...) is recommended, and any other formats will raise a warning.
=cut
( run in 0.437 second using v1.01-cache-2.11-cpan-5dc5da66d9d )