Algorithm-TicketClusterer
view release on metacpan or search on metacpan
lib/Algorithm/TicketClusterer.pm view on Meta::CPAN
corrected for spelling and other errors. What sort of errors one looks for
and corrects would, in general, depend on the application domain of the
tickets. (It is not uncommon for engineering services to use jargon words
and acronyms that look like spelling errors to those not familiar with the
services.) The module expects to see a file that is supplied through the
constructor parameter C<misspelled_words_file> that contains misspelled
words in the first column and their corrected versions in the second
column. An example of such a file is included in the C<examples>
directory. You would need to create your own version of such a file for
your application domain. Since conjuring up the misspellings that your
ticket submitters are likely to throw at you is futile, you might consider
using the following approach which I prefer to actually reading the tickets
for such errors: Turn on the debugging options in the constructor for some
initially collected spreadsheets and watch what sort of words the WordNet
is not able to supply any synonyms for. In a large majority of cases,
these would be the misspelled words.
Expanding a ticket with synonyms is made complicated by the fact that some
common words have such a large number of synonyms that they can overwhelm
the relatively small number of words in a ticket. Adding too many synonyms
in relation to the size of a ticket can not only distort the sense of the
( run in 0.233 second using v1.01-cache-2.11-cpan-496ff517765 )