DTA-CAB

 view release on metacpan or  search on metacpan

Changes  view on Meta::CPAN

	  + it seems though that we can't rely on these, since they don't survive e.g. XML-RPC coding
	  + also we need some hook besides analysis class, for parsers (data doesn't yet have a class)
	  + we also appear to have solved the 'generic access' problems with closures, so we don't need analysis classes there
	  + upshot: lose analysis classes in next checkin
	* more fixes for CAB::Chain
	  + dta-cab-analyze output for new chain now identical to old version (services) for test-kant-8k
	  + TODO: format updates, documentation updates, ...
	* started re-factorization for abstract Chain analyzers
	  - current conundrum: how to handle flexible {src}, {dst} as previously passed in in %opts e.g. for Automaton ?
	  - idea: abstract Analysis class, API
	* fixed buglet (no "return $tok" in MorphSafe analysis sub) --
	  - maybe re-think that API (e.g. analyze() is always destructive?)
	  - next steps: re-factor CAB hacks into analyzers, get old CAB working as Analyzer::Chain
	  - benchmark old closure-style analyzeToken() vs. new force-document analyzeTypes() [via XML-RPC? in-memory?]
	  - add default control options to chain (e.g. doAnalyzeWhatever=>BOOL), add {name} convention for all analyzers
	  - re-work I/O Formats -- better flexibility & handling of new fields
	* fixed bug 'no start without stop on dta-cab.sh restart'

v0.18 2009-12-01 14:46  moocow
	* Format/XmlNative.pm safety fixes

v0.17 2009-11-12 10:30  moocow
	* DocClassify dummy document fixes

v0.16 2009-11-12 10:18  moocow
	* updated pid files

v0.15 2009-10-16 09:45  moocow
	* added configs tweet-server-[1234].(rc|plm) for round-robin
	* use 'funconly-nofeatures' morph variant by default
	* add @NEW tag to DHMM
	* added tag @NEW to negra-yy.123
	* use corpus as target language for tweet rewrite (also re-build ../automata/tweeted)
	* added tweet-server.rc to dta-cab.sh

v0.14 2009-09-23 12:15  moocow
	* tweet stuff
	* added negra-yy.123
	* added dta-words.tf
	* added basic PoS-tagger CAB::Analyzer::Moot
	* added dta-words.de.lex.latin1.tf.t
	* re-routed word-list
	* removed word-list dta-words.lex.tf: now build by 'make -C words/'
	* words/: build from /home/dta/dta_tokenized_xml
	* added words/ make build-system for word-lists
	* updated CAB: use FSTs for eqpho, eqrw
	  - only get latin-1 forms (xlit/unicruft) on output side, but this is exactly what we need for DDC

v0.13 2009-08-28 13:36  moocow
	* updated eqrw rules (use FST instead of dict)
	* added EqRW.pm, EqRW/Dict.pm
	* moved Dict::EqRW -> EqRW::Dict
	* fixed latin-1/utf-8 bug in CAB::Analyzer::Automaton

v0.12 2009-08-06 11:29  moocow
	* equiv-expander work
	  - TODO: get eqrw working via FST

v0.11 2009-08-03 14:26  moocow
	* removed eqpho-dict
	  - TODO: get eqrw working with 1-sided FST (explicit cascade direct from token-stored rw output)
	* added EqPho/FST.pm
	  - updated Analyzer::Automaton for non-deterministic analysis
	  - e.g. split Text->Pho and Pho->EqText into 2 FST analyzers
	* updated dta-eqrw.dict (after additional punishments for 'hülfe' in target lg)
	* more rewrite-equivalence class testing
	  + got integrated in DTA::CAB class, server config, etc.
	  + got dictionary building
	  + found some more data-type bugs (tagh, rewrite, msafe, ...):
	    - hülfe -> helf~en ... [subjII] : see misc/notes/*
	  + found more tokenizer problems/bugs: see misc/notes/tokenizer.txt
	  + added XmlRpc server config arg 'aos=>\%name2options' to allow server to set default options on a per-analyzer basis
	    - useful for e.g. always requiring 'xlit' to run without shamelessly wasting memory by duplicationg $cab

v0.10 2009-07-24 14:37  moocow
	* added dta-cab-compile.perl: compile analyzer configs to binary
	* added binary I/O routines for analyzers in DTA::CAB::Persistent
	* re-worked Dict::EqClass to use non-deterministic kernel (so now any relation can be used to induce the equivalence class)
	* added system/resources/Makefile rules to generate rewrite-equivalence dictionary for use with Dict::EqClass
	* initial tests seem to work well

v0.09 2009-07-24 14:34  moocow
	* dictionary/cache updates

v0.08 2009-07-23 14:34  moocow
	* removed stale old-format cache files
	* added cache-generation to resources Makefile
	* moved EqClass, LatinDict to Dict:: namespace
	* added EqPho analyzer via Gfsm::XL cascade
	  - loads quicker, runs slower, still maybe some buglets
	* updated rewrite dict with better upper/lower case heuristics

v0.07 2009-07-03 13:42  moocow
	* added linear-function max_weight computation for Gfsm::XL (rewrite) cascades

v0.0602 2009-07-03 13:39  moocow
	* updated system/cab.plm to use new rewrite FST, dict
	* updated dta-rw.dict
	* added -log-config option to dta-cab-analyze.perl
	* added cab-server-nodict.plm: useful for testing e.g. rewrite cascade w/o exception lexicon
	* MorphSafe back-changes: ITJ is unsafe
	* minor MorphSafe changes, new rw dict

v0.0601 2009-06-26 14:28  moocow
	* added dta-rw.dict, updated MorphSafe
	* added dta-rw.dict: extracted from grimm/wm-eval data
	* updated resource makefile
	* added symlink taxi-resources
	* Morph/Latin uses tolower=>1

v0.06 2009-06-25 18:48  moocow
	* Morph/Latin: set tolower=>1 by default
	* minor server log format and config updates
	* added magic bless() to cab.plm
	* added latin resource to cab.plm
	* got latin recognizer working via Gfsm subclass Analyzer::Morph::Latin

v0.05 2009-06-17 14:49  moocow
	* more dta-cab link-up stuff
	* more attribute pass-through for dta-tokwrap sentence & document attributes
	* added dta-tokwrap pass-through token attributes {other}{xmlid}, {other}{chars}

v0.04 2009-06-11 12:21  moocow
	* added Unicruft to Makefile.PL PREREQ_PM
	* replaced Transliterator with Unicruft (using libunicruft)

v0.03 2009-06-09 14:26  moocow
	* more encoding hell, started replace Transliterator with Unicruft-based version
	* added parsing and pass-through of '$tok->{other}' attributes for Format::XmlNative
	* updated Text, TT Formats
	* updated log4perl config to use 24-hour time
	* updated init script
	* minor doc fix
	* added -verbose options to perl scripts
	* more doc updates
	* doc update
	* updated docs, incremented version to v0.03

v0.02 2009-06-05 12:58  moocow
	* added Format/XmlTW.pm: dta-tokwrap interface format (1st stab)
	* doc fix
	* added test-word 'oede' to dta-lts.china.dict
	* added analyzer aliases to cab-server.plm, system/cab-server.plm
	* moved dta-cab-multi.sh to dta-cab.sh
	* changed default xml-rpc port to 8088
	* moved Protocol.pod to XmlRpcProtocol.pod

v0.01 2009-05-08 20:51  moocow
	* updated cab.plm
	* added system/ directory: system-wide installation stuff
	* added client-request-level logging to Server::XmlRpc (used RPC::XML::Procedure subclass)
	* added server options: -daemon , -pidfile=FILE
	* MorphSafe fixes (for changed analysis structure)
	* updated program --version behavior: report some SVN keywords
	* added svn:keywords
	* more documentation
	* documented (Client|Server)/XmlRpc.pm
	* documented Analyzers
	* documented, documented, documented
	* moved *.POD to *.rpod (avoid auto-installation)



( run in 0.468 second using v1.01-cache-2.11-cpan-0bb4e1dffa6 )