Lingua-TT
view release on metacpan or search on metacpan
* TT/TextAlignment.pm: better escaping (backslash and form-feed)
* added ptb2tt.perl: convert Penn Treebank .mrg files (1 sentence per line) to TT format (hacked)
* fixed tt-dbapply.perl bug
* TT::Diff.pm fix
* added w:apos class
* tt-rttdiff-eval.perl : fixed
* deleted TT::Diff::saveTextFile0
* weird TT::Diff error in saveTextFile vs saveTextFile0
* rttdiff-eval work: add classification tags to output
* updated verbosity levels in tt-txt-(align|split|merge).perl
* re-formulated tt-txt-align.perl in terms of TT::TextAlignment
* added Lingua::TT::TextAlignment.pm
- combined text+tt representation
- to be used for tokenizer evaluation
- calling main format 'RTT': RawText + TT-data with '%%$c=...' comments for inter-token text (whitespace)
- added tt-txt-merge.perl, tt-txt-split.perl for converting between (TXT,TT) file-pairs and RTT files
- TODO: update tt-txt-align.perl to produce only RTT output (or tt-diff for debugging)
* cleanup
* tt-txt-align basically working
* tt-txt-align.perl: first stab at generic token-text aligner, output to (huge) tt-diff file
* strange fixes found on plato
( run in 0.243 second using v1.01-cache-2.11-cpan-26ccb49234f )