Lingua-TT

 view release on metacpan or  search on metacpan

Changes  view on Meta::CPAN

	* tt-ngrams -glob argument
	* count-mode for tt-ngrams.perl
	* renamed tt-skipgrams.perl to more accurate tt-cofreqs.perl
	  - added bos option to tt-cofreqs.perl
	* added tt-skipgrams.perl
	* stderr autoflush for tt-ngrams.perl
	* tt-ngrams.perl output file
	* list-mode input for tt-ngrams.perl
	* added basic tt-ngrams.perl
	* fix-relinking tt-rtt-(expand|compact).perl
	* fix-relinking tt-tiger-diff-autofix.perl
	* set svn:executable' on *.perl
	* updated emails
	* added LevelDB, RocksDB test (no epiphanous improvement over BerkeleyDB atm)
	* tt-split-2.perl fixes for token-wise splitting
	* TextAlignment: trim text comments by default
	* tt-rtt2txt.perl fix for waste with annotator
	* tt-txt2rtt.perl: utf8 fixes
	* TextAlignment fixes for embedded newlines
	* TextAlignment fixes for embedded newlines
	* + added -trim option to tt-tag-uniq.perl
	* ignore MYMETA.yml
	* tag-xlate: added -trim option
	* tt-tag-xlate.perl: in-place and prefix options, defaults
	* fixed offset/length bug in tt-ensure-stxt.perl
	* tt-ensure-stxt.perl: added mode to extract sentence text from associated text buffer
	* added tt-tag-uniq.perl
	* added tt-tag-xlate.perl : tag-translation utility
	* -compact|-prolix arguments to tt-txt-merge.perl
	* tt-rtt-(compact|expand).perl now point to tt-rtt-convert.perl
	* removed stale tt-rtt-compact.perl
	* tt-rtt dialect stuff
	  - tt-rtt-(expand|compact).perl: standalone hacks, obsolete
	  - tt-rtt-convert.perl : should replace tt-rtt-(expand|compact).perl
	  - TT/TextAlignment.pm : now with %%$RTT:COMPACT=X processing instruction

v0.09 2013-09-04  moocow
	* tt-rttdiff-eval.perl added 'w:nolex' class (requires appropriate unclassify.perl and moot --mark-unknown option)
	* added s:nocaps for non-capitalized sentences
	* added tt-words.sh: extract tokens from tt file(s) (e.g. for use with wc)
	* rtt eval again
	* tt-rttdiff-eval.perl: don't be quite so generous for s:ks (Kiss-Strunk error rate)
	* TextAlignment.pm fix
	* tt-rttdiff-eval.perl: kiss-strunk-esque error rate debugging
	  - added new classes s:dot, s:abbr, s:std
	  - changed semantics for s:nonstd, s:nopunct (now 'no (sentence-final-word)-final punctuation')
	* tt-rttdiff-eval.perl debuggin
	* fixed tt-txt-align.perl missing escape of @ttchars
	* tt-shuffle.perl bugfix
	* added tt-shuffle.perl
	* TT/TextAlignment.pm: better escaping (backslash and form-feed)
	* added ptb2tt.perl: convert Penn Treebank .mrg files (1 sentence per line) to TT format (hacked)
	* fixed tt-dbapply.perl bug
	* TT::Diff.pm fix
	* added w:apos class
	* tt-rttdiff-eval.perl : fixed
	* deleted TT::Diff::saveTextFile0
	* weird TT::Diff error in saveTextFile vs saveTextFile0
	* rttdiff-eval work: add classification tags to output
	* updated verbosity levels in tt-txt-(align|split|merge).perl
	* re-formulated tt-txt-align.perl in terms of TT::TextAlignment
	* added Lingua::TT::TextAlignment.pm
	  - combined text+tt representation
	  - to be used for tokenizer evaluation
	  - calling main format 'RTT': RawText + TT-data with '%%$c=...' comments for inter-token text (whitespace)
	  - added tt-txt-merge.perl, tt-txt-split.perl for converting between (TXT,TT) file-pairs and RTT files
	  - TODO: update tt-txt-align.perl to produce only RTT output (or tt-diff for debugging)
	* cleanup
	* tt-txt-align basically working
	* tt-txt-align.perl: first stab at generic token-text aligner, output to (huge) tt-diff file
	* strange fixes found on plato
	* more sent-string heuristics
	* added script tt-ensure-stxt.perl
	* improved tt-sents2raw (method TT::Sentence::rawString())
	  - extract raw text from formatted comment-lines if available

v0.08 2012-09-20  moocow
	* added -begin, -end, -use options to t[tj]-eval.perl
	* tj-eval fix
	* added $d,$s read-only variables to tj-eval.perl
	* added -canonical argument to tj-eval.perl
	* don't include comments in TT::Sentnce::rawString() output
	* more autofix fixes
	* more tiger autofix stuff
	* TT::Diff fix (change-internal aux data wasn't making it through)
	* tiger autofix updates
	* more tiger auto-fixes
	* autofix fixes
	* added quote-assimilation rule to tt-tiger-diff-autofix-0.4.x.perl
	* added symlink: tt-tiger-diff-autofix.perl
	* tt-tiger autofix fixes
	* updated tt-tiger-diff-autofix-0.4.x.perl : numeric grouping heuristics only
	* fixes in tt-diff-select.perl
	* added tt-diff-select.perl: select 'interesting' subsets of tt-diff output
	* moved tt-tiger-diff-autofix.perl to tt-tiger-diff-autofix-0.3.x.perl (tomasotath 0.3.x version)
	* fixed 'UNIVERSAL::import' warning in TT/Unify.pm and reporting bug in tt-diff-info.perl
	* more CDBFile fixes
	* CDBFile fix
	* re-arragend Dict and Dict::TJ
	  - Dict::TJ is now Dict::JSON
	  - added new class CDBFile::JSON
	  - re-factored inheritance and apply code for Dict, Dict::JSON, CDBFile, CDBFile::JSON
	* added TT/Dict/TJ.pm , tj-dict-merge.perl
	* added CDB_File dep to Makefile.PL
	* updated to v0.08: new TT::CDBFile class + utilities
	* replaced generic encoding stuff with utf8 flag in CDBFile
	* removed encoding stuff from CDBFile: only valid if source and dest encodings are the same
	* added encoding stuff to CDBFile: __very__ inefficient: remove it again

v0.07 2011-08-20  moocow
	* added -tmpdir=DIR option to tt-dict2db.perl: build db in temp directory (e.g. tmpfs), for CAB db builds
	* updated tt-1grams.perl
	* added tt-dict-merge.perl
	* added tt-123-eos-compress.perl (found on uhura; no idea if it works)
	* moved DB::File to DBFile
	* moving DB::File to DBFile

v0.06 2010-11-18  moocow
	* argh argh argh argh argh
	* added DB::File::guesFileType() method
	* moved Dict stuff to class TT::Dict



( run in 1.099 second using v1.01-cache-2.11-cpan-d7f47b0818f )