edit results from the CPAN

Algorithm-TicketClusterer

=head1 CHANGES

Version 1.01 of the module removes the platform dependency of the functions used for
reading the text files for stop words, misspelled words, etc.


=head1 DESCRIPTION

B<Algorithm::TicketClusterer> is a I<perl5> module for retrieving
previously processed Excel-stored tickets similar to a new ticket.  Routing
decisions made for the past similar tickets can be useful in expediting the
routing of a new ticket.

Tickets are commonly used in software services industry and customer
support businesses to record requests for service, product complaints,
user feedback, and so on.

With regard to the routing of a ticket, you would want each new ticket to
be handled by the tech support individual who is most qualified to address
the issue raised in the ticket.  Identifying the right individual for each
new ticket in real-time is no easy task for organizations that man large

lib/Algorithm/TicketClusterer.pm view on Meta::CPAN

the size of the vocabulary; each element of this vector is an integer that
is the frequency of the vocabulary word corresponding to the index of the
element.  Doc modeling also involves calculating the inverse document
frequencies (IDF) values for the words and the inverted index for the
words.  The IDF values are used to diminish the importance of the words
that carry little discriminatory power vis-a-vis the tickets.  IDF for a
word is the logarithm of the ratio of the total number of tickets to the
number of tickets in which the word appears.  Obviously, if a word were to
appear in all the tickets, its IDF value would be zero.  The inverted index
entry for a word is the list of all the tickets that contain that word.
The inverted index greatly expedites the retrieval of tickets similar to a
given query ticket.

=item B<Similarity Based Retrieval:>

A query ticket is subject to the same preprocessing steps as all other
tickets.  Subsequently, it is also represented by a vector in the same
manner as the other tickets.  Using the stemmed words in the query ticket,
the inverted index is used to create a candidate list of ticket vectors for
matching with the query ticket vector.  For this, only those query words
are chosen whose IDF values exceed a threshold.  Finally, we compute the

( run in 0.478 second using v1.01-cache-2.11-cpan-de7293f3b23 )