Alt-CWB-CL-ambs
view release on metacpan or search on metacpan
lib/CWB/CL.pm view on Meta::CPAN
# "strict" mode aborts if any error occurs (convenient in one-off scripts)
CWB::CL::strict(1); # or simply load CWB::CL::Strict module
CWB::CL::set_debug_level('some'); # 'some', 'all' or 'none' (default)
# CWB::CL::Corpus objects
$corpus = new CWB::CL::Corpus "HANSARD-EN"; # name of corpus can be upper or lower case
die "Error: can't access corpus HANSARD-EN" # all error conditions return undef
unless defined $corpus; # (checks are not needed in "strict" mode)
undef $corpus; # currently, mapped memory cannot be freed
# CWB::CL::Attribute objects (positional attributes)
$lemma = $corpus->attribute("lemma", 'p'); # returns CWB::CL::Attribute object
$corpus_length = $lemma->max_cpos; # valid cpos values are 0 .. $corpus_length-1
$lexicon_size = $lemma->max_id; # valid id values are 0 .. $lexicon_size-1
$id = $lemma->str2id($string);
@idlist = $lemma->str2id(@strlist); # all scalar functions map to lists in list context
$str = $lemma->id2str($id);
$len = $lemma->id2strlen($id);
$f = $lemma->id2freq($id);
$id = $lemma->cpos2id($cpos);
$str = $lemma->cpos2str($cpos);
@idlist = $lemma->regex2id($re); # regular expression matching
@cpos = $lemma->idlist2cpos(@idlist); # accessing the index (occurrences of given IDs)
$total_freq = $lemma->idlist2freq(@idlist); # better check the list size first on large corpora
# CWB::CL::AttStruc objects (structural attributes)
$chapter = $corpus->attribute("chapter", 's'); # returns CWB::CL::AttStruc object
$number_of_regions = $chapter->max_struc; # valid region numbers are 0 .. $number_of_regions-1
$has_values = $chapter->struc_values; # are regions annotated with strings?
$struc = $chapter->cpos2struc($cpos); # returns undef if not $cpos is not in <chapter> region
($start, $end) = $chapter->struc2cpos($struc); # returns empty list on error -> $start is undefined
($start, $end) = $chapter->cpos2struc2cpos($struc); # returns empty list if not in <chapter> region
# returns 2 * <n> values (= <n> start/end pairs) if called with <n> arguments
$str = $chapter->struc2str($struc); # always returns undef if not $chapter->struc_values
$str = $chapter->cpos2str($cpos); # combines cpos2struc() and struc2str()
# check whether corpus position is at boundary (l, r, lr) or inside/outside (i/o) of region
if ($chapter->cpos2boundary($cpos) & $CWB::CL::Boundary{'l'}) { ... }
if ($chapter->cpos2is_boundary('l', $cpos)) { ... }
# CWB::CL::AttAlign objects (alignment attributes)
$french = $corpus->attribute("hansard-fr", 'a'); # returns CWB::CL::AttAlign object
$nr_of_alignments = $french->max_alg; # alignment block numbers are 0 .. $nr_of_alignments-1
$extended = $french->has_extended_alignment; # extended alignment allows gaps & crossing alignments
$alg = $french->cpos2alg($cpos); # returns undef if no alignment was found
($src_start, $src_end, $target_start, $target_end)
= $french->alg2cpos($alg); # returns empty list on error
# or use convenience function $french->cpos2alg2cpos($cpos);
# Feature sets (used as values of CWB::CL::Attribute and CWB::CL::AttStruc)
$np_f = $corpus->attribute("np_feat", 's'); # p- and s-attributes can store feature sets
$fs_string = $np_f->cpos2str($cpos); # feature sets are encoded as strings
$fs = CL::set2hash($fs_string); # expand feature set into hash (reference)
if (exists $fs->{"paren"}) { ... {}
$fs1 = CWB::CL::make_set("|proper|nogen|"); # validate feature set or construct from string
$fs2 = CWB::CL::make_set("paren nogen proper", 'split');
$fs = CWB::CL::set_intersection($fs1, $fs2); # intersection of feature set values
$n = CWB::CL::set_size($fs); # size of feature set
=head1 DESCRIPTION
Sorry, there is no full description for this module yet, since the
B<CWB Corpus Library>, on which B<CWB::CL> is based, does not have
complete documentation.
All of the corpus access function provided by the B<CWB::CL> module are subject
to change in version 4.0 of the CWB. If you want to use B<CWB::CL> anyway,
have a look at the test scripts in subdirectory F<t/> of the distribution.
=head1 COPYRIGHT
Copyright (C) 1999-2010 by Stefan Evert [http::/purl.org/stefan.evert]
This software is provided AS IS and the author makes no warranty as to
its use and performance. You may use the software, redistribute and
modify it under the same terms as Perl itself.
=cut
( run in 2.199 seconds using v1.01-cache-2.11-cpan-97f6503c9c8 )