Algorithm-TicketClusterer
view release on metacpan or search on metacpan
Algorithm::TicketClusterer is a perl5 module for retrieving
Excel-stored tickets that are most similar to a new ticket.
Tickets are commonly used in software services industry and
customer support businesses to record requests for service,
product complaints, user feedback, and so on.
Identifying old tickets similar to a new ticket is made
challenging by the fact that folks who submit tickets often
write them quickly and informally. The informal style of
writing means that different people may use different
colloquial terms to describe the same thing. And the
quickness associated with their submission causes the
tickets to frequently contain spelling and other errors such
as conjoined words, fragmentation of long words, and so on.
This module is an attempt at dealing with these challenges.
That different people may use different words for the same
thing is dealt with by using WordNet to expand the tickets
with synonyms in order to ground the tickets in a common
vocabulary.
This module requires the following three modules:
Spreadsheet::ParseExcel
Spreadsheet::XLSX
WordNet::QueryData
the first for extracting information from the old-style
Excel sheets that are commonly used for storing tickets, the
second for extracting the same information from the
new-style Excel sheets, and the third for interfacing with
WordNet for extracting the synonyms and antonyms for the
words in the tickets.
For installation, do the usual
perl Makefile.PL
make
make test
make install
lib/Algorithm/TicketClusterer.pm view on Meta::CPAN
be handled by the tech support individual who is most qualified to address
the issue raised in the ticket. Identifying the right individual for each
new ticket in real-time is no easy task for organizations that man large
service centers and helpdesks. So if it were possible to quickly identify
the previously processed tickets that are most similar to a new ticket, one
could think of constructing semi-automated (or, perhaps, even fully
automated) ticket routers.
Identifying old tickets similar to a new ticket is made challenging by the
fact that folks who submit tickets often write them quickly and informally.
The informal style of writing means that different people may use different
colloquial terms to describe the same thing. And the quickness associated
with their submission causes the tickets to frequently contain spelling and
other errors such as conjoined words, fragmentation of long words, and so
on.
This module is an attempt at dealing with these challenges.
The problem of different people using different words to describe the same
thing is taken care of by using WordNet to add to each ticket a designated
number of synonyms for each word in the ticket. The idea is that after all
lib/Algorithm/TicketClusterer.pm view on Meta::CPAN
This is the final step in the preprocessing of the tickets before they are ready for
the doc modeling stage. This method calls other functions internal to the module
that ultimately make calls to WordNet through the Perl interface provided by the
C<WordNet::QueryData> module.
=item B<get_tickets_from_excel():>
$clusterer->get_tickets_from_excel()
This method calls on the C<Spreadsheet::ParseExcel> module to extract the tickets
from the old-style Excel spreadsheets and the C<Spreadsheet::XLSX> module for doing
the same from the new-style Excel spreadsheets.
=item B<get_ticket_vocabulary_and_construct_inverted_index()>
$clusterer->get_ticket_vocabulary_and_construct_inverted_index()
As mentioned in B<THE THREE STAGES OF PROCESSING>, the second stage of processing ---
doc modeling of the tickets --- starts with the stemming of the words in the tickets,
constructing a vocabulary of all the stemmed words in all the tickets, and
constructing an inverted index for the vocabulary words. All of these things are
accomplished by this method.
lib/Algorithm/TicketClusterer.pm view on Meta::CPAN
=head1 REQUIRED
This module requires the following five modules:
Spreadsheet::ParseExcel
Spreadsheet::XLSX
WordNet::QueryData
Storable
SDBM_File
the first for extracting information from the old-style Excel sheets that are
commonly used for storing tickets, the second for extracting the same information
from the new-style Excel sheets, the third for interfacing with WordNet for
extracting the synonyms and antonyms, the fourth for creating the various disk-based
database files needed by the module, and the last for disk-based hashes used to lend
persistence to the extraction of the alphabet used by the tickets and the inverse
document frequencies of the words.
=head1 EXPORT
None by design.
=head1 CAVEATS
( run in 0.620 second using v1.01-cache-2.11-cpan-49f99fa48dc )