DTA-CAB
view release on metacpan or search on metacpan
* moved moot/ dir from cab/system/resources/ to automata/
* oops
* added local link words -> automata/words
* resource build system update:
- moved dta-cab/system/resources/words to automata/words
* changed automata/ links in system/resouces
* Dict updates
- still todo: move eqrw to dict mode (move build out of system/resouces into ../automata/eqrw)
- eqlemma
* removed stale EqClass.pm
* started work on EqLemma
- re-implemented DTA::CAB::Analyzer::Dict to use Lingua::TT::Dict
- added Analyzer::Dict::DB using Lingua::TT::DB::File (Berkeley DB): tolerably fast and quite handy
- TODO: use new dict class(es) as exception lexica in Analyzer::Automaton (and elsewhere?) -- chuck out legacy code
- TODO: update Analyzer::EqPho::Dict, Analyzer::EqRW::Dict work from 'inverted' dict formats (tt-dict-invert.perl, tt-db-invert.perl)
- tiny tweak for compatibility: add word being analyzed to eqclass if it's not already there
v1.09 2010-10-27 13:05 moocow
* added dta-cab-tt2csv.perl, dta-cab-tt2txt.perl, dta-cab-txt2tt.perl
* added CAB/Analyzer/MootSub.pm : post-processing hacks for moot (bash NE to original form)
* added tiger-local STTS hacks
* added lemma parsing to Analyzer::Automaton if {wantAnalysisLemma} is true (not by default)
+ set wantAnalysisLemma=true in Analyzer::Morph
+ updated Format::TT to use generalized FST-analysis parsing code
- for (lts|eqpho|eqphox|morph|mlatin|rw|rw/lts|rw/morph|eqrw|...)
* re-defined Format::Text as simple wrapper around Format::TT
* added tagh/stts incompatibility hack table %Analyzer::Moot::TAGX
* added moot model
* load logic updates, new Analyzer::DmootSub, prepared for moot integration on dmoot output
* added Analyzer::EqPhoX
v1.08 2010-10-21 14:04 moocow
* added dmoot/tiger to system/resouces
* added 'dist' rule to makefile
* resource re-build update (649 texts) on kaskade for services
v1.07 2010-10-20 11:43 moocow
* updated lexfilter: allow hyphens
* re-linked dta-words.de.lex.latin1.tf to new words/current/ dir
* added from-tokwrap-xml/ for new build system
* moved ddc-based build system to from-ddc-xml
* added Text::Phonetic analyzers Soundex,Koeln,Metaphone
* added (untested) CAB/Analyzer/Alias.pm
* added analyzer-local 'enabled' flag, per-call 'LABEL_enabled' flag
v1.06 2010-10-01 10:08 moocow
* rc: symlink morph
* more safe updates
* improved comment pass-through for TT,Text formats using $(tok,sent)->{_cmts}
* added Analyzer::typeKeys() method for controlled type/token distinction
v1.05 2010-09-28 13:15 moocow
* various dmoot fixes
* added -block-sents option to dta-cab-analyze.perl
* block-wise tt analysis with dta-cab-analyze.perl
* all type keys are inherited by default
* new dta-cab-analysis -analyzer-class=CLASS option
* new Chain::Multi analyzer option 'chain=C1,C2,...' parses user-defined sub-chains
v1.04 2010-09-22 09:38 moocow
* added -block-size=NLINES option to dta-cab-analyze.perl for pseudo-streaming TT analysis
* updated MorphSafe: first- and geonames are now 'safe'
v1.03 2010-05-19 10:36 moocow
* require Unicode::CharName
* updated system/resources using CAB v1.x on uhura (no complete re-build yet)
* small Analyzer::RewriteSub fix (canAnalyze() -> ANY (vs. ALL))
* fixed system/resources plm file generation, brought dta-cab-cachegen.perl up to v1.x api
v1.02 2010-03-10 14:17 moocow
* format work (wip) form uhura
* added __DIE__ to caught server signals
* tweet config system/cab-tweet.plm updated for new Chain::Tweet
v1.01 2010-02-08 14:49 moocow
* fixes for tweet server, adapted CAB::Analyzer::Chain::Tweet
* tiny buglet fixes
* report Unicruft XS, C versions in analyzer
* updated status commands
* use NFC vs NFKC normalization in Unicruft (fixes mantis bug #140)
* v1.x server-config updates
v1.001 2010-01-22 15:42 moocow
* moved old cab.plm, cab-server.plm, cab-server-nodict.plm to v0.x
* removed externals link to de-tiger (breaks checkout for taxi user)
* re-factored (Chain->Chain::DTA) to (Chain->Chain::Multi->Chain::DTA)
+ got Server::XmlRpc working dta-xmlrpc-client.perl and Chain::DTA
+ server config is now MUCH prettier
+ ugly chain-dependent analyzer goop is now relegated to a single method xmlRpcAnalyzers() in Chain::Multi
* added, tested class DTA::CAB::Chain::DTA to replace old DTA::CAB
* added rules for human-readable .csv, .csv.ps, .csv.ps2
* updated to use ddc .con file
* removed CAB::Analysis and sub-classes
* smoothed/fixed Analysis classes
+ it seems though that we can't rely on these, since they don't survive e.g. XML-RPC coding
+ also we need some hook besides analysis class, for parsers (data doesn't yet have a class)
+ we also appear to have solved the 'generic access' problems with closures, so we don't need analysis classes there
+ upshot: lose analysis classes in next checkin
* more fixes for CAB::Chain
+ dta-cab-analyze output for new chain now identical to old version (services) for test-kant-8k
+ TODO: format updates, documentation updates, ...
* started re-factorization for abstract Chain analyzers
- current conundrum: how to handle flexible {src}, {dst} as previously passed in in %opts e.g. for Automaton ?
- idea: abstract Analysis class, API
* fixed buglet (no "return $tok" in MorphSafe analysis sub) --
- maybe re-think that API (e.g. analyze() is always destructive?)
- next steps: re-factor CAB hacks into analyzers, get old CAB working as Analyzer::Chain
- benchmark old closure-style analyzeToken() vs. new force-document analyzeTypes() [via XML-RPC? in-memory?]
- add default control options to chain (e.g. doAnalyzeWhatever=>BOOL), add {name} convention for all analyzers
- re-work I/O Formats -- better flexibility & handling of new fields
* fixed bug 'no start without stop on dta-cab.sh restart'
v0.18 2009-12-01 14:46 moocow
* Format/XmlNative.pm safety fixes
v0.17 2009-11-12 10:30 moocow
* DocClassify dummy document fixes
v0.16 2009-11-12 10:18 moocow
* updated pid files
( run in 0.585 second using v1.01-cache-2.11-cpan-140bd7fdf52 )