Algorithm-TicketClusterer
view release on metacpan or search on metacpan
lib/Algorithm/TicketClusterer.pm view on Meta::CPAN
The problem of different people using different words to describe the same
thing is taken care of by using WordNet to add to each ticket a designated
number of synonyms for each word in the ticket. The idea is that after all
the tickets are expanded in this manner, they would become grounded in a
common vocabulary. The synonym expansion of a ticket takes place only after
the negated phrases (that is, the words preceded by 'no' or 'not') are
replaced by their antonyms.
Obviously, expanding a ticket by synonyms makes sense only after it is
corrected for spelling and other errors. What sort of errors one looks for
and corrects would, in general, depend on the application domain of the
tickets. (It is not uncommon for engineering services to use jargon words
and acronyms that look like spelling errors to those not familiar with the
services.) The module expects to see a file that is supplied through the
constructor parameter C<misspelled_words_file> that contains misspelled
words in the first column and their corrected versions in the second
column. An example of such a file is included in the C<examples>
directory. You would need to create your own version of such a file for
your application domain. Since conjuring up the misspellings that your
ticket submitters are likely to throw at you is futile, you might consider
( run in 0.318 second using v1.01-cache-2.11-cpan-64827b87656 )