Algorithm-TicketClusterer

 view release on metacpan or  search on metacpan

README  view on Meta::CPAN

Algorithm::TicketClusterer is a perl5 module for retrieving
Excel-stored tickets that are most similar to a new ticket.
Tickets are commonly used in software services industry and
customer support businesses to record requests for service,
product complaints, user feedback, and so on.

Identifying old tickets similar to a new ticket is made
challenging by the fact that folks who submit tickets often
write them quickly and informally.  The informal style of
writing means that different people may use different
colloquial terms to describe the same thing. And the
quickness associated with their submission causes the
tickets to frequently contain spelling and other errors such
as conjoined words, fragmentation of long words, and so on.
This module is an attempt at dealing with these challenges.
That different people may use different words for the same
thing is dealt with by using WordNet to expand the tickets
with synonyms in order to ground the tickets in a common
vocabulary.

This module requires the following three modules:

    Spreadsheet::ParseExcel                                                             
    Spreadsheet::XLSX                                                                   
    WordNet::QueryData                                                                  

the first for extracting information from the old-style
Excel sheets that are commonly used for storing tickets, the
second for extracting the same information from the
new-style Excel sheets, and the third for interfacing with
WordNet for extracting the synonyms and antonyms for the
words in the tickets.

For installation, do the usual

    perl Makefile.PL
    make
    make test
    make install

lib/Algorithm/TicketClusterer.pm  view on Meta::CPAN

be handled by the tech support individual who is most qualified to address
the issue raised in the ticket.  Identifying the right individual for each
new ticket in real-time is no easy task for organizations that man large
service centers and helpdesks.  So if it were possible to quickly identify
the previously processed tickets that are most similar to a new ticket, one
could think of constructing semi-automated (or, perhaps, even fully
automated) ticket routers.

Identifying old tickets similar to a new ticket is made challenging by the
fact that folks who submit tickets often write them quickly and informally.
The informal style of writing means that different people may use different
colloquial terms to describe the same thing. And the quickness associated
with their submission causes the tickets to frequently contain spelling and
other errors such as conjoined words, fragmentation of long words, and so
on.

This module is an attempt at dealing with these challenges.

The problem of different people using different words to describe the same
thing is taken care of by using WordNet to add to each ticket a designated
number of synonyms for each word in the ticket.  The idea is that after all

lib/Algorithm/TicketClusterer.pm  view on Meta::CPAN

This is the final step in the preprocessing of the tickets before they are ready for
the doc modeling stage.  This method calls other functions internal to the module
that ultimately make calls to WordNet through the Perl interface provided by the
C<WordNet::QueryData> module.

=item B<get_tickets_from_excel():>

    $clusterer->get_tickets_from_excel()

This method calls on the C<Spreadsheet::ParseExcel> module to extract the tickets
from the old-style Excel spreadsheets and the C<Spreadsheet::XLSX> module for doing
the same from the new-style Excel spreadsheets.

=item  B<get_ticket_vocabulary_and_construct_inverted_index()>

    $clusterer->get_ticket_vocabulary_and_construct_inverted_index()

As mentioned in B<THE THREE STAGES OF PROCESSING>, the second stage of processing ---
doc modeling of the tickets --- starts with the stemming of the words in the tickets,
constructing a vocabulary of all the stemmed words in all the tickets, and
constructing an inverted index for the vocabulary words.  All of these things are
accomplished by this method.

lib/Algorithm/TicketClusterer.pm  view on Meta::CPAN

=head1 REQUIRED

This module requires the following five modules:

    Spreadsheet::ParseExcel
    Spreadsheet::XLSX
    WordNet::QueryData
    Storable
    SDBM_File

the first for extracting information from the old-style Excel sheets that are
commonly used for storing tickets, the second for extracting the same information
from the new-style Excel sheets, the third for interfacing with WordNet for
extracting the synonyms and antonyms, the fourth for creating the various disk-based
database files needed by the module, and the last for disk-based hashes used to lend
persistence to the extraction of the alphabet used by the tickets and the inverse
document frequencies of the words.

=head1 EXPORT

None by design.

=head1 CAVEATS



( run in 0.620 second using v1.01-cache-2.11-cpan-49f99fa48dc )