DTA-CAB

 view release on metacpan or  search on metacpan

Changes  view on Meta::CPAN

	* moved moot/ dir from cab/system/resources/ to automata/
	* oops
	* added local link words -> automata/words
	* resource build system update:
	  - moved dta-cab/system/resources/words to automata/words
	* changed automata/ links in system/resouces
	* Dict updates
	  - still todo: move eqrw to dict mode (move build out of system/resouces into ../automata/eqrw)
	  - eqlemma
	* removed stale EqClass.pm
	* started work on EqLemma
	  - re-implemented DTA::CAB::Analyzer::Dict to use Lingua::TT::Dict
	  - added Analyzer::Dict::DB using Lingua::TT::DB::File (Berkeley DB): tolerably fast and quite handy
	  - TODO: use new dict class(es) as exception lexica in Analyzer::Automaton (and elsewhere?) -- chuck out legacy code
	  - TODO: update Analyzer::EqPho::Dict, Analyzer::EqRW::Dict  work from 'inverted' dict formats (tt-dict-invert.perl, tt-db-invert.perl)
	  - tiny tweak for compatibility: add word being analyzed to eqclass if it's not already there

v1.09 2010-10-27 13:05  moocow
	* added dta-cab-tt2csv.perl, dta-cab-tt2txt.perl, dta-cab-txt2tt.perl
	* added CAB/Analyzer/MootSub.pm : post-processing hacks for moot (bash NE to original form)
	* added tiger-local STTS hacks
	* added lemma parsing to Analyzer::Automaton if {wantAnalysisLemma} is true (not by default)
	  + set wantAnalysisLemma=true in Analyzer::Morph
	  + updated Format::TT to use generalized FST-analysis parsing code
	    - for (lts|eqpho|eqphox|morph|mlatin|rw|rw/lts|rw/morph|eqrw|...)
	* re-defined Format::Text as simple wrapper around Format::TT
	* added tagh/stts incompatibility hack table %Analyzer::Moot::TAGX
	* added moot model
	* load logic updates, new Analyzer::DmootSub, prepared for moot integration on dmoot output
	* added Analyzer::EqPhoX

v1.08 2010-10-21 14:04  moocow
	* added dmoot/tiger to system/resouces
	* added 'dist' rule to makefile
	* resource re-build update (649 texts) on kaskade for services

v1.07 2010-10-20 11:43  moocow
	* updated lexfilter: allow hyphens
	* re-linked dta-words.de.lex.latin1.tf to new words/current/ dir
	* added from-tokwrap-xml/ for new build system
	* moved ddc-based build system to from-ddc-xml
	* added Text::Phonetic analyzers Soundex,Koeln,Metaphone
	* added (untested) CAB/Analyzer/Alias.pm
	* added analyzer-local 'enabled' flag, per-call 'LABEL_enabled' flag

v1.06 2010-10-01 10:08  moocow
	* rc: symlink morph
	* more safe updates
	* improved comment pass-through for TT,Text formats using $(tok,sent)->{_cmts}
	* added Analyzer::typeKeys() method for controlled type/token distinction

v1.05 2010-09-28 13:15  moocow
	* various dmoot fixes
	* added -block-sents option to dta-cab-analyze.perl
	* block-wise tt analysis with dta-cab-analyze.perl
	* all type keys are inherited by default
	* new dta-cab-analysis -analyzer-class=CLASS option
	* new Chain::Multi analyzer option 'chain=C1,C2,...' parses user-defined sub-chains

v1.04 2010-09-22 09:38  moocow
	* added -block-size=NLINES option to dta-cab-analyze.perl for pseudo-streaming TT analysis
	* updated MorphSafe: first- and geonames are now 'safe'

v1.03 2010-05-19 10:36  moocow
	* require Unicode::CharName
	* updated system/resources using CAB v1.x on uhura (no complete re-build yet)
	* small Analyzer::RewriteSub fix (canAnalyze() -> ANY (vs. ALL))
	* fixed system/resources plm file generation, brought dta-cab-cachegen.perl up to v1.x api

v1.02 2010-03-10 14:17  moocow
	* format work (wip) form uhura
	* added __DIE__ to caught server signals
	* tweet config system/cab-tweet.plm updated for new Chain::Tweet

v1.01 2010-02-08 14:49  moocow
	* fixes for tweet server, adapted CAB::Analyzer::Chain::Tweet
	* tiny buglet fixes
	* report Unicruft XS, C versions in analyzer
	* updated status commands
	* use NFC vs NFKC normalization in Unicruft (fixes mantis bug #140)
	* v1.x server-config updates

v1.001 2010-01-22 15:42  moocow
	* moved old cab.plm, cab-server.plm, cab-server-nodict.plm to v0.x
	* removed externals link to de-tiger (breaks checkout for taxi user)
	* re-factored (Chain->Chain::DTA) to (Chain->Chain::Multi->Chain::DTA)
	  + got Server::XmlRpc working dta-xmlrpc-client.perl and Chain::DTA
	  + server config is now MUCH prettier
	  + ugly chain-dependent analyzer goop is now relegated to a single method xmlRpcAnalyzers() in Chain::Multi
	* added, tested class DTA::CAB::Chain::DTA to replace old DTA::CAB
	* added rules for human-readable .csv, .csv.ps, .csv.ps2
	* updated to use ddc .con file
	* removed CAB::Analysis and sub-classes
	* smoothed/fixed Analysis classes
	  + it seems though that we can't rely on these, since they don't survive e.g. XML-RPC coding
	  + also we need some hook besides analysis class, for parsers (data doesn't yet have a class)
	  + we also appear to have solved the 'generic access' problems with closures, so we don't need analysis classes there
	  + upshot: lose analysis classes in next checkin
	* more fixes for CAB::Chain
	  + dta-cab-analyze output for new chain now identical to old version (services) for test-kant-8k
	  + TODO: format updates, documentation updates, ...
	* started re-factorization for abstract Chain analyzers
	  - current conundrum: how to handle flexible {src}, {dst} as previously passed in in %opts e.g. for Automaton ?
	  - idea: abstract Analysis class, API
	* fixed buglet (no "return $tok" in MorphSafe analysis sub) --
	  - maybe re-think that API (e.g. analyze() is always destructive?)
	  - next steps: re-factor CAB hacks into analyzers, get old CAB working as Analyzer::Chain
	  - benchmark old closure-style analyzeToken() vs. new force-document analyzeTypes() [via XML-RPC? in-memory?]
	  - add default control options to chain (e.g. doAnalyzeWhatever=>BOOL), add {name} convention for all analyzers
	  - re-work I/O Formats -- better flexibility & handling of new fields
	* fixed bug 'no start without stop on dta-cab.sh restart'

v0.18 2009-12-01 14:46  moocow
	* Format/XmlNative.pm safety fixes

v0.17 2009-11-12 10:30  moocow
	* DocClassify dummy document fixes

v0.16 2009-11-12 10:18  moocow
	* updated pid files



( run in 0.585 second using v1.01-cache-2.11-cpan-140bd7fdf52 )