ALBD

 view release on metacpan or  search on metacpan

FDL.txt  view on Meta::CPAN

This License is a kind of "copyleft", which means that derivative
works of the document must themselves be free in the same sense.  It
complements the GNU General Public License, which is a copyleft
license designed for free software.

We have designed this License in order to use it for manuals for free
software, because free software needs free documentation: a free
program should come with manuals providing the same freedoms that the
software does.  But this License is not limited to software manuals;
it can be used for any textual work, regardless of subject matter or
whether it is published as a printed book.  We recommend this License
principally for works whose purpose is instruction or reference.


1. APPLICABILITY AND DEFINITIONS

This License applies to any manual or other work, in any medium, that
contains a notice placed by the copyright holder saying it can be
distributed under the terms of this License.  Such a notice grants a
world-wide, royalty-free license, unlimited in duration, to use that
work under the conditions stated herein.  The "Document", below,

FDL.txt  view on Meta::CPAN

ASCII without markup, Texinfo input format, LaTeX input format, SGML
or XML using a publicly available DTD, and standard-conforming simple
HTML, PostScript or PDF designed for human modification.  Examples of
transparent image formats include PNG, XCF and JPG.  Opaque formats
include proprietary formats that can be read and edited only by
proprietary word processors, SGML or XML for which the DTD and/or
processing tools are not generally available, and the
machine-generated HTML, PostScript or PDF produced by some word
processors for output purposes only.

The "Title Page" means, for a printed book, the title page itself,
plus such following pages as are needed to hold, legibly, the material
this License requires to appear in the title page.  For works in
formats which do not have any title page as such, "Title Page" means
the text near the most prominent appearance of the work's title,
preceding the beginning of the body of the text.

A section "Entitled XYZ" means a named subunit of the Document whose
title either is precisely XYZ or contains XYZ in parentheses following
text that translates XYZ in another language.  (Here XYZ stands for a
specific section name mentioned below, such as "Acknowledgements",

FDL.txt  view on Meta::CPAN

copying of the copies you make or distribute.  However, you may accept
compensation in exchange for copies.  If you distribute a large enough
number of copies you must also follow the conditions in section 3.

You may also lend copies, under the same conditions stated above, and
you may publicly display copies.


3. COPYING IN QUANTITY

If you publish printed copies (or copies in media that commonly have
printed covers) of the Document, numbering more than 100, and the
Document's license notice requires Cover Texts, you must enclose the
copies in covers that carry, clearly and legibly, all these Cover
Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on
the back cover.  Both covers must also clearly and legibly identify
you as the publisher of these copies.  The front cover must present
the full title with all words of the title equally prominent and
visible.  You may add other material on the covers in addition.
Copying with changes limited to the covers, as long as they preserve
the title of the Document and satisfy these conditions, can be treated
as verbatim copying in other respects.

FDL.txt  view on Meta::CPAN

of the compilation's users beyond what the individual works permit.
When the Document is included in an aggregate, this License does not
apply to the other works in the aggregate which are not themselves
derivative works of the Document.

If the Cover Text requirement of section 3 is applicable to these
copies of the Document, then if the Document is less than one half of
the entire aggregate, the Document's Cover Texts may be placed on
covers that bracket the Document within the aggregate, or the
electronic equivalent of covers if the Document is in electronic form.
Otherwise they must appear on printed covers that bracket the whole
aggregate.


8. TRANSLATION

Translation is considered a kind of modification, so you may
distribute translations of the Document under the terms of section 4.
Replacing Invariant Sections with translations requires special
permission from their copyright holders, but you may include
translations of some or all Invariant Sections in addition to the

GPL.txt  view on Meta::CPAN

    a) You must cause the modified files to carry prominent notices
    stating that you changed the files and the date of any change.

    b) You must cause any work that you distribute or publish, that in
    whole or in part contains or is derived from the Program or any
    part thereof, to be licensed as a whole at no charge to all third
    parties under the terms of this License.

    c) If the modified program normally reads commands interactively
    when run, you must cause it, when started running for such
    interactive use in the most ordinary way, to print or display an
    announcement including an appropriate copyright notice and a
    notice that there is no warranty (or else, saying that you provide
    a warranty) and that users may redistribute the program under
    these conditions, and telling the user how to view a copy of this
    License.  (Exception: if the Program itself is interactive but
    does not normally print such an announcement, your work based on
    the Program is not required to print an announcement.)

These requirements apply to the modified work as a whole.  If
identifiable sections of that work are not derived from the Program,
and can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works.  But when you
distribute the same sections as part of a whole which is a work based
on the Program, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the
entire whole, and thus to each and every part regardless of who wrote it.

lib/ALBD.pm  view on Meta::CPAN

	return;
    }
    if (exists $lbdOptions{'precisionAndRecall_explicit'}) {
	$self->timeSlicing_generatePrecisionAndRecall_explicit();
	return;
    }
    if (exists $lbdOptions{'precisionAndRecall_implicit'}) {
	$self->timeSlicing_generatePrecisionAndRecall_implicit();
	return;
    }
    print "Open Discovery\n";
    print $self->_parametersToString();

#Get inputs
    my $startCuisRef = $self->_getStartCuis();
    my $linkingAcceptTypesRef = $self->_getAcceptTypes('linking');
    my $targetAcceptTypesRef = $self->_getAcceptTypes('target');
    print "startCuis = ".(join(',', @{$startCuisRef}))."\n";
    print "linkingAcceptTypes = ".(join(',', keys %{$linkingAcceptTypesRef}))."\n";
    print "targetAcceptTypes = ".(join(',', keys %{$targetAcceptTypesRef}))."\n";

#Get the Explicit Matrix
    $start = time;
    my $explicitMatrixRef;
    if(!defined $lbdOptions{'explicitInputFile'}) {
	die ("ERROR: explicitInputFile must be defined in LBD config file\n");
    }
    $explicitMatrixRef = Discovery::fileToSparseMatrix($lbdOptions{'explicitInputFile'});
    print "Got Explicit Matrix in ".(time() - $start)."\n";
    
#Get the Starting Matrix
    $start = time();
    my $startingMatrixRef = 
	Discovery::getRows($startCuisRef, $explicitMatrixRef);
    print "Got Starting Matrix in ".(time() - $start)."\n";

    #if using average minimum weight, grab the a->b scores
    my %abPairsWithScores = ();
    if ($lbdOptions{'rankingProcedure'} eq 'averageMinimumWeight' 
	|| $lbdOptions{'rankingProcedure'} eq 'ltc_amw') {

	#apply semantic type filter to columns only
	if ((scalar keys %{$linkingAcceptTypesRef}) > 0) {
	    Filters::semanticTypeFilter_columns(
		$explicitMatrixRef, $linkingAcceptTypesRef, $umls_interface);

lib/ALBD.pm  view on Meta::CPAN

	    }
	}
        Rank::getBatchAssociationScores(\%abPairsWithScores, $explicitMatrixRef, $lbdOptions{'rankingMeasure'}, $umls_association);
    }

    #Apply Semantic Type Filter to the explicit matrix
    if ((scalar keys %{$linkingAcceptTypesRef}) > 0) {
	$start = time();
	Filters::semanticTypeFilter_rowsAndColumns(
	    $explicitMatrixRef, $linkingAcceptTypesRef, $umls_interface);
	print "Semantic Type Filter in ".(time() - $start)."\n";
    }
    
#Get Implicit Connections
    $start = time();
    my $implicitMatrixRef;
    if (defined $lbdOptions{'implicitInputFile'}) {
	$implicitMatrixRef = Discovery::fileToSparseMatrix($lbdOptions{'implicitInputFile'});
    } else {
	$implicitMatrixRef = Discovery::findImplicit($explicitMatrixRef, $startingMatrixRef);
    }
    print "Got Implicit Matrix in ".(time() - $start)."\n";

#Remove Known Connections 
     $start = time();
     $implicitMatrixRef = Discovery::removeExplicit($startingMatrixRef, $implicitMatrixRef);
     print "Removed Known Connections in ".(time() - $start)."\n";
 
#Apply Semantic Type Filter
    if ((scalar keys %{$targetAcceptTypesRef}) > 0) {
	$start = time();
	Filters::semanticTypeFilter_columns(
	    $implicitMatrixRef, $targetAcceptTypesRef, $umls_interface);
	print "Semantic Type Filter in ".(time() - $start)."\n";
    }

#Score Implicit Connections
    $start = time();	
    my $scoresRef;
    if ($lbdOptions{'rankingProcedure'} eq 'allPairs') {
	$scoresRef = Rank::scoreImplicit_fromAllPairs($startingMatrixRef, $explicitMatrixRef, $implicitMatrixRef, $lbdOptions{'rankingMeasure'}, $umls_association);
    } elsif ($lbdOptions{'rankingProcedure'} eq 'averageMinimumWeight') {
	$scoresRef = Rank::scoreImplicit_averageMinimumWeight($startingMatrixRef, $explicitMatrixRef, $implicitMatrixRef, $lbdOptions{'rankingMeasure'}, $umls_association, \%abPairsWithScores);
    } elsif ($lbdOptions{'rankingProcedure'} eq 'linkingTermCount') {
	$scoresRef = Rank::scoreImplicit_linkingTermCount($startingMatrixRef, $explicitMatrixRef, $implicitMatrixRef);
    } elsif ($lbdOptions{'rankingProcedure'} eq 'frequency') {
	$scoresRef = Rank::scoreImplicit_frequency($startingMatrixRef, $explicitMatrixRef, $implicitMatrixRef);
    } elsif ($lbdOptions{'rankingProcedure'} eq 'ltcAssociation') {
	$scoresRef = Rank::scoreImplicit_ltcAssociation($startingMatrixRef, $explicitMatrixRef, $implicitMatrixRef, $lbdOptions{'rankingMeasure'}, $umls_association);
    } elsif ($lbdOptions{'rankingProcedure'} eq 'ltc_amw') {
	$scoresRef = Rank::scoreImplicit_LTC_AMW($startingMatrixRef, $explicitMatrixRef, $implicitMatrixRef, $lbdOptions{'rankingMeasure'}, $umls_association, \%abPairsWithScores);
    } else {
	die ("Error: Invalid Ranking Procedure\n");
    }    
    print "Scored in: ".(time()-$start)."\n";
  
#Rank Implicit Connections
    $start = time();
    my $ranksRef = Rank::rankDescending($scoresRef);
    print "Ranked in: ".(time()-$start)."\n";

#Output The Results
    open OUT, ">$lbdOptions{implicitOutputFile}" 
	or die "unable to open implicit ouput file: "
	."$lbdOptions{implicitOutputFile}\n";
    my $outputString = $self->_rankedTermsToString($scoresRef, $ranksRef);
    my $paramsString = $self->_parametersToString();
    print OUT $paramsString;
    print OUT $outputString;
    close OUT;

#Done
    print "DONE!\n\n";
}

#----------------------------------------------------------------------------

# performs LBD, closed discovery
# input:  none
# ouptut: none, but a results file is written to disk
sub performLBD_closedDiscovery {
    my $self = shift;
    my $start; #used to record run times

    print "Closed Discovery\n";
    print $self->_parametersToString();

#Get inputs
    my $startCuisRef = $self->_getStartCuis();
    my $targetCuisRef = $self->_getTargetCuis();
    my $linkingAcceptTypesRef = $self->_getAcceptTypes('linking');

#Get the Explicit Matrix
    $start = time;
    my $explicitMatrixRef;
    if(!defined $lbdOptions{'explicitInputFile'}) {
	die ("ERROR: explicitInputFile must be defined in LBD config file\n");
    }
    $explicitMatrixRef = Discovery::fileToSparseMatrix($lbdOptions{'explicitInputFile'});
    print "Got Explicit Matrix in ".(time() - $start)."\n";
    
#Get the Starting Matrix
    $start = time();
    my $startingMatrixRef = 
	Discovery::getRows($startCuisRef, $explicitMatrixRef);
    print "Got Starting Matrix in ".(time() - $start)."\n";
    print "   numRows in startMatrix = ".(scalar keys %{$startingMatrixRef})."\n";

    #Apply Semantic Type Filter to the explicit matrix
    if ((scalar keys %{$linkingAcceptTypesRef}) > 0) {
	$start = time();
	Filters::semanticTypeFilter_rowsAndColumns(
	    $explicitMatrixRef, $linkingAcceptTypesRef, $umls_interface);
	print "Semantic Type Filter in ".(time() - $start)."\n";
    }

#Get the Target Matrix
    $start = time();
    my $targetMatrixRef = 
	Discovery::getRows($targetCuisRef, $explicitMatrixRef);
    print "Got Target Matrix in ".(time() - $start)."\n";
    print "   numRows in targetMatrix = ".(scalar keys %{$targetMatrixRef})."\n";

#find the linking terms in common for starting and target matrices
    print "Finding terms in common\n";
    #get starting linking terms
    my %startLinks = ();
    foreach my $row (keys %{$startingMatrixRef}) {
	foreach my $col (keys %{${$startingMatrixRef}{$row}}) {
	    $startLinks{$col} = ${${$startingMatrixRef}{$row}}{$col};
	}
    }
    print "   num start links = ".(scalar keys %startLinks)."\n";
    #get target linking terms
    my %targetLinks = ();
    foreach my $row (keys %{$targetMatrixRef}) {
	foreach my $col (keys %{${$targetMatrixRef}{$row}}) {
	    $targetLinks{$col} = ${${$targetMatrixRef}{$row}}{$col};
	}
    }
    print "   num target links = ".(scalar keys %targetLinks)."\n";
    #find linking terms in common
    my %inCommon = ();
    foreach my $startLink (keys %startLinks) {
	if (exists $targetLinks{$startLink}) {
	    $inCommon{$startLink} = $startLinks{$startLink} + $targetLinks{$startLink};
	}
    }
     print "   num in common = ".(scalar keys %inCommon)."\n";

#Score and Rank
    #Score the linking terms in common
    my $scoresRef = \%inCommon;
    #TODO score is just summed frequency right now

    #Rank Implicit Connections
    $start = time();
    my $ranksRef = Rank::rankDescending($scoresRef);
    print "Ranked in: ".(time()-$start)."\n";

#Output The Results
    open OUT, ">$lbdOptions{implicitOutputFile}" 
	or die "unable to open implicit ouput file: "
	."$lbdOptions{implicitOutputFile}\n";
    my $outputString = $self->_rankedTermsToString($scoresRef, $ranksRef);
    my $paramsString = $self->_parametersToString();
    print OUT $paramsString;
    print OUT $outputString;

    print OUT "\n\n---------------------------------------\n\n";
    print OUT "starting linking terms:\n";
    print OUT join("\n", keys %startLinks);

    print OUT "\n\n---------------------------------------\n\n";
    print OUT "target linking terms:\n";
    print OUT join("\n", keys %targetLinks, );

    close OUT;

#Done
    print "DONE!\n\n";
}

#NOTE, this is experimental code for using the implicit matrix as input
# to association measures and then rank. This provides a nice method of 
# association for implicit terms, but there are implementation problems
# primarily memory constraints or time constraints now, because this
# requires the entire implicit matrix be computed. This can be done, but
# access to it is then slow. Would require a major redo of the code
#
=comment
# performs LBD, but using implicit matrix ranking schemes.
# Since the order of operations for those methods are slighly different
# a new method has been created.
# input:  none
# output: none, but a results file is written to disk
sub performLBD_implicitMatrixRanking {
    my $self = shift;
    my $start; #used to record run times
    print  $self->_parametersToString();
    print "In Implicit Ranking\n";
    
#Get inputs
    my $startCuisRef = $self->_getStartCuis();
    my $linkingAcceptTypesRef = $self->_getAcceptTypes('linking');
    my $targetAcceptTypesRef = $self->_getAcceptTypes('target');
    print "startCuis = ".(join(',', @{$startCuisRef}))."\n";
    print "linkingAcceptTypes = ".(join(',', keys %{$linkingAcceptTypesRef}))."\n";
    print "targetAcceptTypes = ".(join(',', keys %{$targetAcceptTypesRef}))."\n";

#Score Implicit Connections
    $start = time();	
    my $scoresRef;
    $scoresRef = Rank::scoreImplicit_fromImplicitMatrix($startCuisRef,  $lbdOptions{'implicitInputFile'}, $lbdOptions{rankingMeasue}, $umls_association);
    print "Scored in: ".(time()-$start)."\n";
  
#Rank Implicit Connections
    $start = time();
    my $ranksRef = Rank::rankDescending($scoresRef);
    print "Ranked in: ".(time()-$start)."\n";

#Output The Results
    open OUT, ">$lbdOptions{implicitOutputFile}" 
	or die "unable to open implicit ouput file: "
	."$lbdOptions{implicitOutputFile}\n";
    my $outputString = $self->_rankedTermsToString($scoresRef, $ranksRef);
    my $paramsString = $self->_parametersToString();
    print OUT $paramsString;
    print OUT $outputString;
    close OUT;

#Done
    print "DONE!\n\n";
}
=cut


##################################################
################ Time Slicing ####################
##################################################

#NOTE: This function isn't really tested, and is really slow right now
# Generates precision and recall values by varying the threshold
# of the A->B ranking measure.
# input:  none
# output: none, but precision and recall values are printed to STDOUT
sub timeSlicing_generatePrecisionAndRecall_explicit {
    my $NUM_SAMPLES = 100; #TODO, read fomr file number of samples to average over for timeslicing
    my $self = shift;
    print "In timeSlicing_generatePrecisionAndRecall\n";

    my $numIntervals = 10;

#Get inputs
    my $startAcceptTypesRef = $self->_getAcceptTypes('start');
    my $linkingAcceptTypesRef = $self->_getAcceptTypes('linking');
    my $targetAcceptTypesRef = $self->_getAcceptTypes('target');


#Get the Explicit Matrix

lib/ALBD.pm  view on Meta::CPAN

    my $trueMax = -999999; 
    my $predictedMin = 999999;
    my $predictedMax = 999999;
    my $predictedTotal = 0;
    my $trueTotal = 0;
    my $allPairsCount = scalar keys %{$assocScoresRef};
    for (my $i = $numIntervals; $i >= 0; $i--) {

	#determine the number of samples to threshold
	my $numSamples = $i*($allPairsCount/$numIntervals);
	print "i, numSamples/allPairsCount = $i, $numSamples/$allPairsCount\n";
	#grab samples at just 10 to estimate the final point (this is what 
	# makes it an 11 point curve)
	if ($numSamples == 0) {
	    $numSamples = 10;
	}

	#apply a threshold (number of samples)
	my $thresholdedStartingMatrixRef = TimeSlicing::grabKHighestRankedSamples($numSamples, $assocScoresRef, $startingMatrixRef);

	#generate implicit knowledge

lib/ALBD.pm  view on Meta::CPAN


	#apply a semantic type filter to the implicit matrix
	if ((scalar keys %{$targetAcceptTypesRef}) > 0) {
	    Filters::semanticTypeFilter_columns(
		$implicitMatrixRef, $targetAcceptTypesRef, $umls_interface);
	}

	#calculate precision and recall
	my ($precision, $recall) = TimeSlicing::calculatePrecisionRecall(
	    $implicitMatrixRef, $postCutoffMatrixRef);
	print "precision = $precision, recall = $recall\n";

	#calculate averages/min/max only for $i= $numIntervals, which is all terms
	if ($i == $numIntervals) {
	    #average over all terms
	    foreach my $rowKey(keys %{$implicitMatrixRef}) {
		#get the counts true and predicted for this term (row of matrix)
		my $numPredicted = scalar keys %{${$implicitMatrixRef}{$rowKey}};
		my $numTrue = scalar keys %{${$postCutoffMatrixRef}{$rowKey}};

		#sum counts

lib/ALBD.pm  view on Meta::CPAN

		$trueTotal += $numTrue;
	    }
	    #take the average, both true and predicted matrices
	    # have the same number of rows.
	    $predictedAverage /= (scalar keys %{$implicitMatrixRef});
	    $trueAverage /= (scalar keys %{$implicitMatrixRef});
	}
    } 

    #output stats
    print "predicted - total, min, max, average = $predictedTotal, $predictedMin, $predictedMax, $predictedAverage\n";
    print "true - total, min, max, average = $trueTotal, $trueMin, $trueMax, $trueAverage\n";
}


# generates precision and recall values by varying the threshold
# of the A->C ranking measure. Also generates precision at k, and
# mean average precision
# input:  none
# output: none, but precision, recall, precision at k, and map values
#         output to STDOUT
sub timeSlicing_generatePrecisionAndRecall_implicit {
    my $NUM_SAMPLES = 200; #TODO, read fomr file number of samples to average over for timeslicing
    my $self = shift;
    my $start; #used to record run times
    print "In timeSlicing_generatePrecisionAndRecall_implicit\n";

    #Get inputs
    my $startAcceptTypesRef = $self->_getAcceptTypes('start');
    my $linkingAcceptTypesRef = $self->_getAcceptTypes('linking');
    my $targetAcceptTypesRef = $self->_getAcceptTypes('target');

#-----------
# Starting Matrix Creation
#-----------
    #Get the Explicit Matrix
    print "loading explicit\n";
    my $explicitMatrixRef;
    if(!defined $lbdOptions{'explicitInputFile'}) {
	die ("ERROR: explicitInputFile must be defined in LBD config file\n");
    }
    $explicitMatrixRef = Discovery::fileToSparseMatrix($lbdOptions{'explicitInputFile'});

    #create the starting matrix
    print "generating starting\n";
    my $startingMatrixRef 
	= TimeSlicing::generateStartingMatrix($explicitMatrixRef, \%lbdOptions, $startAcceptTypesRef, $NUM_SAMPLES, $umls_interface);
#----------
    

#--------
# Gold Loading/Creation
#--------
    #load or create the gold matrix
    my $goldMatrixRef;
    if (exists $lbdOptions{'goldInputFile'}) {
	print "inputting gold\n";
	$goldMatrixRef = Discovery::fileToSparseMatrix($lbdOptions{'goldInputFile'});
    }
    else {
	print "loading post cutoff\n";
	$goldMatrixRef = TimeSlicing::loadPostCutOffMatrix($startingMatrixRef, $explicitMatrixRef, $lbdOptions{'postCutoffFileName'});

	#remove explicit knowledge from the post cutoff matrix
	$goldMatrixRef = Discovery::removeExplicit($startingMatrixRef, $goldMatrixRef);

	#apply a semantic type filter to the post cutoff matrix
	print "applying semantic filter to post-cutoff matrix\n";
	if ((scalar keys %{$targetAcceptTypesRef}) > 0) {
	    Filters::semanticTypeFilter_columns(
		$goldMatrixRef, $targetAcceptTypesRef, $umls_interface);
	}

	#TODO why is the gold matrix outputting with an extra line between samples?
	#output the gold matrix
	if (exists $lbdOptions{'goldOutputFile'}) {
	    print "outputting gold\n";
	    Discovery::outputMatrixToFile($lbdOptions{'goldOutputFile'}, $goldMatrixRef); 
	}
    }
#-------
  
#-------
# AB Scoring (if needed)
#-------
    #if using average minimum weight, grab the a->b scores, #TODO this is sloppy here, but it has to be here...how to make it fit better?
    my %abPairsWithScores = ();
    if ($lbdOptions{'rankingProcedure'} eq 'averageMinimumWeight'
		|| $lbdOptions{'rankingProcedure'} eq 'ltc_amw') {
	print "getting AB scores\n";

	#apply semantic type filter to columns only
	if ((scalar keys %{$linkingAcceptTypesRef}) > 0) {
	    Filters::semanticTypeFilter_columns(
		$explicitMatrixRef, $linkingAcceptTypesRef, $umls_interface);
	}
	#intitialize the abPairs to the frequency of co-ocurrence
	foreach my $row (keys %{$startingMatrixRef}) {
	    foreach my $col (keys %{${$startingMatrixRef}{$row}}) {
		$abPairsWithScores{"$row,$col"} = ${${$startingMatrixRef}{$row}}{$col}; 

lib/ALBD.pm  view on Meta::CPAN

	Rank::getBatchAssociationScores(
	    \%abPairsWithScores, $explicitMatrixRef, $lbdOptions{'rankingMeasure'}, $umls_association);
    }
#--------

#------------
# Matrix Filtering/Thresholding
#------------
    #load or threshold the matrix
    if (exists $lbdOptions{'thresholdedMatrix'}) {
	print "loading thresholded matrix\n";
	$explicitMatrixRef = (); #clear (for memory)
	$explicitMatrixRef = Discovery::fileToSparseMatrix($lbdOptions{'thresholdedMatrix'});
    }
    #else {#TODO apply a threshold}
    #NOTE, we must threshold the entire matrix because that is how we are calculating association scores

    #Apply Semantic Type Filter to the explicit matrix
    print "applying semantic filter to explicit matrix\n";
    if ((scalar keys %{$linkingAcceptTypesRef}) > 0) {
	Filters::semanticTypeFilter_rowsAndColumns(
	    $explicitMatrixRef, $linkingAcceptTypesRef, $umls_interface);
    }

#------------
# Prediction Generation
#------------
    #load or create the predictions matrix
    my $predictionsMatrixRef;
    if (exists $lbdOptions{'predictionsInFile'}) {
	print "loading predictions\n";
	$predictionsMatrixRef = Discovery::fileToSparseMatrix($lbdOptions{'predictionsInFile'});
    }
    else {
	print "generating predictions\n";

	#generate implicit knowledge
	print "Squaring Matrix\n";
	$predictionsMatrixRef = Discovery::findImplicit(
	    $explicitMatrixRef, $startingMatrixRef);

	#Remove Known Connections
	print "Removing Known from Predictions\n";
	$predictionsMatrixRef 
	    = Discovery::removeExplicit($startingMatrixRef, $predictionsMatrixRef);

	#apply a semantic type filter to the predictions matrix
	print "Applying Semantic Filter to Predictions\n";
	if ((scalar keys %{$targetAcceptTypesRef}) > 0) {
	    Filters::semanticTypeFilter_columns(
		$predictionsMatrixRef, $targetAcceptTypesRef, $umls_interface);
	}

	#save the implicit knowledge matrix to file
	if (exists ($lbdOptions{'predictionsOutFile'})) {
	    print "outputting predictions\n";
	    Discovery::outputMatrixToFile($lbdOptions{'predictionsOutFile'}, $predictionsMatrixRef);
	}
    }

#-------------------------------------------

    #At this point, the explicitMatrixRef has been filtered and thresholded
    #The predictions matrix Ref has been generated from the filtered and 
    #  thresholded explicitMatrixRef, only rows of starting terms remain, filtered, and 
    #  had explicit removed

lib/ALBD.pm  view on Meta::CPAN



#--------------
# Get the ranks of all predictions
#--------------
    #get the scores and ranks seperately for each row
    # thereby generating scores and ranks for each starting
    # term individually
    my %rowRanks = ();
    my ($n1pRef, $np1Ref, $npp);
    print "getting row ranks\n";
    foreach my $rowKey (keys %{$predictionsMatrixRef}) { 
	#grab rows from start and implicit matrices
	my %startingRow = ();
	$startingRow{$rowKey} = ${$startingMatrixRef}{$rowKey};
	my %implicitRow = ();
	$implicitRow{$rowKey} = ${$predictionsMatrixRef}{$rowKey};

	#Score Implicit Connections	
	my $scoresRef;
	if ($lbdOptions{'rankingProcedure'} eq 'allPairs') {

lib/ALBD.pm  view on Meta::CPAN

    while (my $line = <IN>) {
	#check if its a comment or blank line
	$firstChar = substr $line, 0, 1;
	
	if ($firstChar ne '#' && $line =~ /[^\s]+/) {
	    #line contains data, grab the key and value
	    $line =~ /<([^>]+)>([^\n]*)/;	  

	    #make sure the data was read in correctly
	    if (!$1) {
		print STDERR 
		    "Warning: Invalid line in $configFileName: $line\n";
	    }
	    else {
		#data was grabbed from the line, add to hash
		if ($2) {
		    #add key and value to the optionsHash
		    $optionsHash{$1} = $2;
		}
		else {
		    #add key and set default value to the optionsHash

lib/ALBD.pm  view on Meta::CPAN

}



##############################################################################
#        function to produce output
##############################################################################
# outputs the implicit terms to string
# input:  $scoresRef <- a reference to a hash of scores (hash{CUI}=score)
#         $ranksRef <- a reference to an array of CUIs ranked by their score
#         $printTo <- optional, outputs the $printTo top ranked terms. If not
#                     specified, all terms are output
# output: a line seperated string containing ranked terms, scores, and thier
#         preferred terms
sub _rankedTermsToString {
    my $self = shift;
    my $scoresRef = shift;
    my $ranksRef = shift;
    my $printTo = shift;

    #set printTo
    if (!$printTo) {
	$printTo = scalar @{$ranksRef};
    }
    
    #construct the output string
    my $string = '';
    my $index;
    for (my $i = 0; $i < $printTo; $i++) {
	#add the rank
	$index = $i+1;
	$string .= "$index\t";
	#add the score
	$string .= sprintf "%.5f\t", "${$scoresRef}{${$ranksRef}[$i]}\t";
	#add the CUI
	$string .= "${$ranksRef}[$i]\t";
	#add the name
	my $name = $umls_interface->getPreferredTerm(${$ranksRef}[$i]);
	#if no preferred name, get anything
	if (!defined $name || $name eq '') {
	    my $termListRef = $umls_interface->getTermList('C0440102');
	    if (scalar @{$termListRef} > 0) {
		$name = '.**'.${$termListRef}[0];
	    }

lib/ALBD.pm  view on Meta::CPAN

}

##############################################################################
#        functions for debugging
##############################################################################
=comment
sub debugLBD {
    my $self = shift;
    my $startingCuisRef = shift;

    print "Starting CUIs = ".(join(',', @{$startingCuisRef}))."\n";

#Get the Explicit Matrix
    my ($explicitMatrixRef, $cuiToIndexRef, $indexToCuiRef, $matrixSize) = 
	Discovery::tableToSparseMatrix('N_11', $cuiFinder);
    print "Explicit Matrix:\n";
    _printMatrix($explicitMatrixRef, $matrixSize, $indexToCuiRef);
    print "-----------------------\n";

#Get the Starting Matrix
    my $startingMatrixRef = 
	Discovery::getRows($startingCuisRef, $explicitMatrixRef);
    print "Starting Matrix:\n";
    _printMatrix($startingMatrixRef, $matrixSize, $indexToCuiRef);
    print "-----------------------\n";
    
#Get Implicit Connections
    my $implicitMatrixRef 
	= Discovery::findImplicit($explicitMatrixRef, $startingMatrixRef, 
				  $indexToCuiRef, $matrixSize);
    print "Implicit Matrix:\n";
    _printMatrix($implicitMatrixRef, $matrixSize, $indexToCuiRef);
    print "-----------------------\n";

#Remove Known Connections
    $implicitMatrixRef = Discovery::removeExplicit($explicitMatrixRef, 
						   $implicitMatrixRef);
    print "Implicit Matrix with Explicit Removed\n";
    _printMatrix($implicitMatrixRef, $matrixSize, $indexToCuiRef);
    print "-----------------------\n";
    print "\n\n";

#Test N11, N1P, etc...
    #NOTE...always do n11 first, if n11 = -1, no need to compute the others...there is no co-occurrence between them
    my $n11 = Rank::getN11('C0','C2',$explicitMatrixRef);
    my $npp = Rank::getNPP($explicitMatrixRef);
    my $n1p = Rank::getN1P('C0', $explicitMatrixRef);
    my $np1 = Rank::getNP1('C2', $explicitMatrixRef); 
    print "Contingency Table Values from Explicit Matrix\n";
    print "n11 = $n11\n";
    print "npp = $npp\n";
    print "n1p = $n1p\n";
    print "np1 = $np1\n";

#Test other rank methods
    my $scoresRef = Rank::scoreImplicit_fromAllPairs($startingMatrixRef, $explicitMatrixRef, $implicitMatrixRef, $lbdOptions{rankingMethod}, $umls_association);
    my $ranksRef = Rank::rankDescending($scoresRef);
    print "Scores: \n";
    foreach my $cui (keys %{$scoresRef}) {
	print "   scores{$cui} = ${$scoresRef}{$cui}\n";
    }
    print "Ranks = ".join(',', @{$ranksRef})."\n";
}

sub _printMatrix {
    my $matrixRef = shift;
    my $matrixSize = shift;
    my $indexToCuiRef = shift;
    
    for (my $i = 0; $i < $matrixSize; $i++) {
	my $index1 = ${$indexToCuiRef}{$i};
	for (my $j = 0; $j < $matrixSize; $j++) {
	    my $printed = 0;
	    my $index2 = ${$indexToCuiRef}{$j};
	    my $hash1Ref =  ${$matrixRef}{$index1};

	    if (defined $hash1Ref) {
		my $val = ${$hash1Ref}{$index2};
		if (defined $val) {
		    print $val."\t";
		    $printed = 1;
		}
	    }
	    if (!$printed) {
		print "0\t";
	    }
	}
	print "\n";
    }
}
=cut


1;

lib/LiteratureBasedDiscovery/Discovery.pm  view on Meta::CPAN

sub outputMatrixToFile {
    my $outFile = shift;
    my $matrixRef = shift;
    
    #open the output file and output fhe matrx
    open OUT, ">$outFile" or die ("Error opening matrix output file: $outFile\n");
    my $rowRef;
    foreach my $rowKey (keys %{$matrixRef}) {
	$rowRef = ${$matrixRef}{$rowKey};
	foreach my $colKey (keys %{$rowRef}) {
	    print OUT "$rowKey\t$colKey\t${$rowRef}{$colKey}\n";
	}
    }
}


#Note: Table to sparse is no longer used, but could be useful in the future
=comment
#  retreive a table from mysql and convert it to a sparse matrix (a hash of 
#     hashes)
#  input : $tableName <- the name of the table to output

lib/LiteratureBasedDiscovery/Filters.pm  view on Meta::CPAN

    my $umls = shift;
 
=comment   
    #Count the number of keys before and after filtering (for debugging)
    my %termsHash = ();
    foreach my $key1 (keys %{$matrixRef}) {
	foreach my $key2 (keys %{${$matrixRef}{$key1}}) {
	    $termsHash{$key2} = 1;
	}
    }
    print "   number of keys before filtering = ".(scalar keys %termsHash)."\n";
=cut

    #eliminate values that are incorrect semantic groups
    #do each row at a time, remove column values that 
    #are the incorrect semantic type
    my %cuisChecked = ();
    #cuisChecked keeps track of cuis that have been checked 
    # for elimination. If the cui has been checked its key
    # will exist in the hash. Values of -1 indicate it should
    # be eliminated, values of 1 indicate it should stay.

lib/LiteratureBasedDiscovery/Filters.pm  view on Meta::CPAN



=comment
    #Count the number of keys after filtering (for debugging)
    %termsHash = ();
    foreach my $key1 (keys %{$matrixRef}) {
	foreach my $key2 (keys %{${$matrixRef}{$key1}}) {
	    $termsHash{$key2} = 1;
	}
    }
    print "   number of keys after filtering = ".(scalar keys %termsHash)."\n";
=cut
}


# applies a semantic group filter to the matrix, by removing keys that 
# are not allowed semantic type. Only removes types from rows, 
# so is applied for times slicing, before randomly selecting terms of 
# one semantic type
# input:  $matrixRef <- ref to a sparse matrix to be filtered
#         $acceptTypesRef <- a ref to a hash of accept type strings

lib/LiteratureBasedDiscovery/Filters.pm  view on Meta::CPAN

    my $umls = shift;
    
=comment
    #Count the number of keys before and after filtering (for debugging)
    my %termsHash = ();
    foreach my $key1 (keys %{$matrixRef}) {
	foreach my $key2 (keys %{${$matrixRef}{$key1}}) {
	    $termsHash{$key2} = 1;
	}
    }
    print "   number of keys before filtering = ".(scalar keys %termsHash)."\n";
=cut

    #eliminate values that are incorrect semantic groups
    #do each row at a time, remove column values that 
    #are the incorrect semantic type
    my $keep = -1;
    #cuisChecked keeps track of cuis that have been checked 
    # for elimination. If the cui has been checked its key
    # will exist in the hash. Values of -1 indicate it should
    # be eliminated, values of 1 indicate it should stay.

lib/LiteratureBasedDiscovery/Filters.pm  view on Meta::CPAN

    }

=comment
    #Count the number of keys after filtering (for debugging)
    %termsHash = ();
    foreach my $key1 (keys %{$matrixRef}) {
	foreach my $key2 (keys %{${$matrixRef}{$key1}}) {
	    $termsHash{$key2} = 1;
	}
    }
    print "   number of keys after filtering = ".(scalar keys %termsHash)."\n";
=cut
}


# applies a semantic group filter to the matrix, by removing keys that 
# are not allowed semantic type. Only removes types from columns, 
# so is applied to the implicit matrix (starting term rows with implicit
# columns).
# input:  $matrixRef <- ref to a sparse matrix to be filtered
#         $acceptTypesRef <- a ref to a hash of accept type strings

lib/LiteratureBasedDiscovery/Filters.pm  view on Meta::CPAN

    my $umls = shift;
 
=comment   
    #Count the number of keys before and after filtering (for debugging)
    my %termsHash = ();
    foreach my $key1 (keys %{$matrixRef}) {
	foreach my $key2 (keys %{${$matrixRef}{$key1}}) {
	    $termsHash{$key2} = 1;
	}
    }
    print "   number of keys before filtering = ".(scalar keys %termsHash)."\n";
=cut

    #eliminate values that are incorrect semantic groups
    #do each row at a time, remove column values that 
    #are the incorrect semantic type
    my %cuisChecked = ();
    #cuisChecked keeps track of cuis that have been checked 
    # for elimination. If the cui has been checked its key
    # will exist in the hash. Values of -1 indicate it should
    # be eliminated, values of 1 indicate it should stay.

lib/LiteratureBasedDiscovery/Filters.pm  view on Meta::CPAN

    }

=comment
    #Count the number of keys after filtering (for debugging)
    %termsHash = ();
    foreach my $key1 (keys %{$matrixRef}) {
	foreach my $key2 (keys %{${$matrixRef}{$key1}}) {
	    $termsHash{$key2} = 1;
	}
    }
    print "   number of keys after filtering = ".(scalar keys %termsHash)."\n";
=cut

}

# gets the semantic types of the group
# input:  $group <- a string specifying a semantic group
#         $umls <- an instance of UMLS::Interface
# output: a ref to a hash of TUIs
sub getTypesOfGroup {
    my $group = shift;

lib/LiteratureBasedDiscovery/TimeSlicing.pm  view on Meta::CPAN

    #grab the input
    my $goldMatrixRef = shift;
    my $rowRanksRef = shift;
    my $numIntervals = shift;


#calculate and output stats
#------------------------------------------

 #calculate precision and recall
    print "calculating precision and recall\n";
    my ($precisionRef, $recallRef) = &calculatePrecisionAndRecall_implicit(
	 $goldMatrixRef, $rowRanksRef, $numIntervals);

    #output precision and recall
    print "----- average precision at 10% recall intervals (i recall precision) ----> \n";
    foreach my $i (sort {$a <=> $b} keys %{$precisionRef}) {
	print "      $i ${$recallRef}{$i} ${$precisionRef}{$i}\n";
    }
    print "\n";
    
#-------------------------------------------
    
    #calculate mean average precision
    my $map = &calculateMeanAveragePrecision(
	$goldMatrixRef, $rowRanksRef);
    #output mean average precision
    print "---------- mean average precision ---------------> \n";
    print "      MAP = $map\n";
    print "\n";

#-------------------------------------------
    
    #calculate precision at k
    print "calculating precision at k\n";
    my $precisionAtKRef = &calculatePrecisionAtK($goldMatrixRef, $rowRanksRef);
    #output precision at k
    print "---------- mean precision at k intervals ---------------> \n";
    foreach my $k (sort {$a <=> $b} keys %{$precisionAtKRef}) {
	print "      $k ${$precisionAtKRef}{$k}\n";
    }
    print "\n";

#-------------------------------------------
    
    #calculate cooccurrences at k
    print "calculating mean cooccurrences at k\n";
    my $cooccurrencesAtKRef = &calculateMeanCooccurrencesAtK($goldMatrixRef, $rowRanksRef);
    #output cooccurrences at k
    print "---------- mean cooccurrences at k intervals ---------------> \n";
    foreach my $k (sort {$a <=> $b} keys %{$cooccurrencesAtKRef}) {
	print "      $k ${$cooccurrencesAtKRef}{$k}\n";
    }
    print "\n";

}


# loads a list of cuis for use in time slicing from file
# the CUI file contains a line seperated list of CUIs
# input:  $cuiFileName <- a string specifying the file to load cuis from
# output: $\%cuis <- a ref to a hash of cuis, each key is a cui, values are 1
sub loadCUIs {
    my $cuiFileName = shift;

lib/LiteratureBasedDiscovery/TimeSlicing.pm  view on Meta::CPAN

# calculates average precision and recall of the generated implicit matrix 
# compared to the post cutoff matrix
# input:  $predictionsMatrixRef <- a ref to a sparse matrix of predicted 
#                                  discoveries
#         $trueMatrixRef <- a ref to a sparse matrix of true discoveries
# output: ($precision, $recall) <- two scalar values specifying the precision 
#                                  and recall
sub calculatePrecisionRecall {
    my $predictionsMatrixRef = shift; #a matrix of predicted discoveries
    my $trueMatrixRef = shift; #a matrix of true discoveries
    print "calculating precision and recall\n";

    #bounds check, the predictions matrix must contain keys
    if ((scalar keys %{$predictionsMatrixRef}) < 1) {
	return (0,0); #precision and recall are both zero
    }

    #calculate precision and recall averaged over each cui
    my $precision = 0;
    my $recall = 0;
    #each row key corresponds to a term for which we calculate

lib/LiteratureBasedDiscovery/TimeSlicing.pm  view on Meta::CPAN

# to rows in the starting matrix ref to save memory, and because those are 
# the only rows that are needed.
# input:  $startingMatrixRef <- a ref to the starting sparse matrix
#         $explicitMatrix Ref <- a ref to the explicit sparse matrix
#         $postCutoffFileName <- the filename to the postCutoffMatrix
# output: \%postCutoffMatrix <- a ref to the postCutoff sparse matrix
sub loadPostCutOffMatrix {
    my $startingMatrixRef = shift;
    my $explicitMatrixRef = shift;
    my $postCutoffFileName = shift;
    print "loading postCutoff Matrix\n";
    
    #open the post cutoff file
    open IN, $postCutoffFileName 
	or die ("ERROR: cannot open post cutoff file: $postCutoffFileName");

    #create hash of cuis to grab
    my %cuisToGrab = ();
    foreach my $rowKey (keys %{$startingMatrixRef}) {
	$cuisToGrab{$rowKey} = 1;
    }

lib/LiteratureBasedDiscovery/TimeSlicing.pm  view on Meta::CPAN


    #check if a file is defined
    if (exists ${$lbdOptionsRef}{'cuiListFileName'}) {
	#grab the rows defined by the cuiListFile
	my $cuisRef = &loadCUIs(${$lbdOptionsRef}{'cuiListFileName'});
	foreach my $cui (keys %{$cuisRef}) {
	    if(exists ${$explicitMatrixRef}{$cui}) {
		$startingMatrix{$cui} = ${$explicitMatrixRef}{$cui};	
	    }
	    else {
		print STDERR "WARNING: CUI from cuiListFileName is not in explicitMatrix: $cui\n";
	    }
	}
    }
    else {
	#randomly grab rows
	#apply semantic filter to the rows (just retreive appropriate rows)
	my $rowsToKeepRef = getRowsOfSemanticTypes(
	    $explicitMatrixRef, $startTermAcceptTypesRef, $umls_interface);
	((scalar keys %{$rowsToKeepRef}) >= $numRows) or die("ERROR: number of acceptable rows starting terms is less than $numRows\n");

lib/LiteratureBasedDiscovery/TimeSlicing.pm  view on Meta::CPAN

	    if (exists $rowNumbers{$i}) {
		$startingMatrix{$key} = ${$explicitMatrixRef}{$key}
	    }
	    $i++;
	}

	#output the cui list if needed
	if (exists ${$lbdOptionsRef}{'cuiListOutputFile'}) {
	    open OUT, ">".${$lbdOptionsRef}{'cuiListOutputFile'} or die ("ERROR: cannot open cuiListOutputFile:".${$lbdOptionsRef}{'cuiListOutputFile'}."\n");
	    foreach my $cui (keys %startingMatrix) {
		print OUT "$cui\n";
	    }
	    close OUT;
	}
    }

    #return the starting matrix
    return \%startingMatrix;
}


lib/LiteratureBasedDiscovery/TimeSlicing.pm  view on Meta::CPAN

#         $umls_association <- an instance of UMLS::Association
# output: \%cuiPairs <- a ref to a hash of CUI pairs and their assocaition
#                       each key of the hash is a comma seperated string 
#                       containing cui1, and cui2 of the pair 
#                       (e.g. 'cui1,cui2'), and each value is their association
#                       score using the specified assocition measure
sub getAssociationScores {
    my $matrixRef = shift;
    my $rankingMeasure = shift;
    my $umls_association = shift;
    print "   getting Association Scores, rankingMeasure = $rankingMeasure\n";
    
    #generate a list of cui pairs in the matrix
    my %cuiPairs = ();
    print "   generating association scores:\n";
    foreach my $rowKey (keys %{$matrixRef}) {
	foreach my $colKey (keys %{${$matrixRef}{$rowKey}}) {
	    $cuiPairs{"$rowKey,$colKey"} = ${${$matrixRef}{$rowKey}}{$colKey};
	}
    }
    
    #get ranks for all the cui pairs in the matrix
    #return a hash of cui pairs and their frequency
    if ($rankingMeasure eq 'frequency') {
	return \%cuiPairs;

lib/LiteratureBasedDiscovery/TimeSlicing.pm  view on Meta::CPAN

#                            (e.g. 'cui1,cui2'), values are their association
#                            scores.
#         $matrixRef <- a reference to a co-occurrence sparse matrix that 
#                       corresponds to the assocScoresRef
# output: \%thresholdedMatrix <- a ref to a sparse matrix containing only the 
#                                $k ranked samples (cui pairs)
sub grabKHighestRankedSamples {
    my $k = shift;
    my $assocScoresRef = shift;
    my $matrixRef = shift;
    print "getting $k highest ranked samples\n";

    #apply the threshold
    my $preKeyCount = scalar keys %{$assocScoresRef};
    my $postKeyCount = 0;
    my %thresholdedMatrix = ();

    #get the keys sorted by value in descending order
    my @sortedKeys = sort { $assocScoresRef->{$b} <=> $assocScoresRef->{$a} } keys(%$assocScoresRef);
    my $threshold =  ${$assocScoresRef}{$sortedKeys[$k-1]};
    print " threshold = $threshold\n";

    #add the first k keys to the thresholded matrix
    my ($cui1, $cui2);
    foreach my $key (@sortedKeys) {
	($cui1, $cui2) = split(/,/, $key);

	#create new hash at rowkey location (if needed)
	if (!(exists $thresholdedMatrix{$cui1})) {
	    my %newHash = ();
	    $thresholdedMatrix{$cui1} = \%newHash;

lib/LiteratureBasedDiscovery/TimeSlicing.pm  view on Meta::CPAN

#         $rowRanksRef <- a ref to a hash of arrays of ranked predictions. 
#                         Each hash key is a cui,  each hash element is an 
#                         array of ranked predictions for that cui. The ranked 
#                         predictions are cuis are ordered in descending order 
#                         based on association. (from Rank::RankDescending)
# output: $map <- a scalar value of mean average precision (MAP)
sub calculateMeanAveragePrecision {
    #grab the input
    my $trueMatrixRef = shift; # a matrix of true discoveries
    my $rowRanksRef = shift; # a hash of ranked predicted discoveries
    print "calculating mean average precision\n";

    #calculate MAP for each true discovery being predicted
    my $map = 0;
    foreach my $rowKey (keys %{$trueMatrixRef}) {
	my $rankedPredictionsRef = ${$rowRanksRef}{$rowKey}; #an array ref of ranked predictions

	#skip for rows that have no predictions
	if (!defined $rankedPredictionsRef) {
	    next;
	} 

samples/configFileSamples/UMLSAssociationConfig  view on Meta::CPAN

# UMLS::Association
<database>CUI_Bigram

# If the UMLS::Association Database is not installed on the local machine
# The following parameters may be needed to connect to the server
<hostname>192.168.00.00
<username>username
<password>password
<socket>/var/run/mysqld.sock

# makes the UMLS::Association not print to the command line
<t>

samples/configFileSamples/UMLSInterfaceConfig  view on Meta::CPAN

# are in <>'s, and values follow directly after with no space. As as example, 
# the line "<database>umls" will pass the 'database' parameter with a value 
# of 'umls' of UMLS::Interface options hash for its initialization.
#
# For parameters where no value is needed, just write the name of the
# parameter in '<>' (e.g. '<debug>')
#
#
# See UMLS::Interface for more detail

# makes the UMLS::Interface not print to the command line
<t>

# Put the full pathname of the UMLS::Interface Config File
<config>/home/sam/assocLBD-0.01/config/interfaceConfig

# If the UMLS::Interface Database is not installed on the local machine
# The following parameters may be needed to connect to the server
<hostname>192.168.00.00
<username>username
<password>password

samples/runSample.pl  view on Meta::CPAN

#Demo file, showing how to run open discovery using the sample data, and how 
# to perform time slicing evaluation using the sample data

# run a sample lbd using the parameters in the lbd configuration file
print "\n           OPEN DISCOVERY          \n";
`perl ../utils/runDiscovery.pl lbdConfig`;
print "LBD Open discovery results output to sampleOutput\n\n";

# run a sample time slicing
# first remove the co-occurrences of the precutoff matrix (in this case it is 
# the sampleExplicitMatrix from the post cutoff matrix. This generates a gold 
# standard discovery matrix from which time slicing may be performed
# This requires modifying the removeExplicit.pl, which we have done for you. 
# The variables for this example in removeExplicit.pl are:
#  my $matrixFileName = 'sampleExplicitMatrix';
#  my $squaredMatrixFileName = postCutoffMatrix;
#  my $outputFileName = 'sampleGoldMatrix';
#`perl ../utils/datasetCreator/removeExplicit.pl`;

# next, run time slicing 
print "          TIME SLICING          \n";
`perl ../utils/runDiscovery.pl timeSlicingConfig > sampleTimeSliceOutput`;
print "LBD Time Slicing results output to sampleTimeSliceOutput\n";

t/test.t  view on Meta::CPAN

# open and closed discovery code portions
#########################################################


#Test that the demo file can run correctly
`(cd ./samples/; perl runSample.pl) &`;

#######################################################
#test that the demo output matches the expected demo output
#########################################################
print "Performing Open Discovery Tests:\n";

#read in the gold scores from the open discovery gold
my %goldScores = ();
open IN, './t/goldSampleOutput' 
    or die ("Error: Cannot open gold sample output\n");
while (my $line = <IN>) {
    if ($line =~ /\d+\t(\d+\.\d+)\t(C\d+)/) {
	$goldScores{$2} = $1;
    }
}

t/test.t  view on Meta::CPAN

    }
    else {
	$allExist = 0;
	$allMatch = 0;
	last;
    }
}
ok ($allExist == 1, "All CUIs exist in the output");  #all cuis exist in the new output file
ok ($allMatch == 1, "All Scores are the same in the output");  #all scores are the same in the new output file

print "Done with Open Discovery Tests\n\n";



#######################################################
#test that time slicing is computed correctly
#########################################################
print "Performing Time Slicing Tests\n";

#read in gold time slicing output
(my $goldAPScoresRef, my $goldMAP, my $goldPAtKScoresRef, my $goldFAtKScoresRef)
    = &readTimeSlicingData('./t/goldSampleTimeSliceOutput');

#read in new time slicing output
(my $newAPScoresRef, my $newMAP, my $newPAtKScoresRef, my $newFAtKScoresRef)
    = &readTimeSlicingData('./samples/sampleTimeSliceOutput');

#check that the correct number of values are read for all the 

t/test.t  view on Meta::CPAN

# (within error tolerance)
my $fAtKSame = 1;
for (my $i = 0; $i < scalar @{$goldFAtKScoresRef}; $i++) {
    if (abs(${$goldFAtKScoresRef}[$i] - ${$newFAtKScoresRef}[$i]) > $atKErrorTol) {
	$fAtKSame = 0;
	last;
    }
}
ok($fAtKSame == 1, "Frequency at K Matches");

print "Done with Time Slicing Tests\n";



############################################################
#function to read in time slicing data values
sub readTimeSlicingData {
    my $fileName = shift;

    #read in the gold time slicing values
    my @APScores = ();

utils/datasetCreator/applyMaxThreshold.pl  view on Meta::CPAN


# gets co-occurrence stats, returns a hash of (unique) co-occurrence counts 
# for each CUI. (count is unique or not depending on $applyToUnique)
sub getStats {
    my $inputFile = shift;
    my $applyToUnique = shift;

    #open files
    open IN, $inputFile or die("ERROR: unable to open inputFile\n");   

    print "Getting Stats\n";
    #count stats for each line of the file
    my ($cui1, $cui2, $val);
    my %count = (); #a count of the number of (unique) co-occurrences
    while (my $line = <IN>) {
	#split the line
	($cui1, $cui2, $val) = split(/\t/,$line);

	if ($applyToUnique) {
	    #update the unique co-occurrence counts
	    $count{$cui1}++;

utils/datasetCreator/applyMaxThreshold.pl  view on Meta::CPAN

    my $inputFile = shift;
    my $outputFile = shift;
    my $maxThreshold = shift;
    my $countRef = shift;

    #open the input and output
    open IN, $inputFile or die("ERROR: unable to open inputFile\n");
    open OUT, ">$outputFile" 
	or die ("ERROR: unable to open outputFile: $outputFile\n");

    print "ApplyingThreshold\n";
    #threshold each line of the file
    my ($cui1, $cui2, $val);
    while (my $line = <IN>) {
	#grab values 
	($cui1, $cui2, $val) = split(/\t/,$line);

	#skip if either $cui1 or $cui2 are greater than the threshold
	# the counts in %count have been set already according to 
	# whether $applyToUnique or not
	if (${$countRef}{$cui1} > $maxThreshold 
	    || ${$countRef}{$cui2} > $maxThreshold) {
	    next;
	}
	else {
	    print OUT $line;
	}

    }
    close IN;
    close OUT;

    print "Done!\n";
}

utils/datasetCreator/applyMinThreshold.pl  view on Meta::CPAN

    #grab the input
    my $minThreshold = shift;
    my $inputFile = shift;
    my $outputFile = shift;

    #open files
    open IN, $inputFile or die("ERROR: unable to open inputFile\n");
    open OUT, ">$outputFile" 
	or die ("ERROR: unable to open outputFile: $outputFile\n");

    print "Reading File\n";
    #threshold each line of the file
    my ($key, $cui1, $cui2, $val);
    while (my $line = <IN>) {
	#grab values 
	($cui1, $cui2, $val) = split(/\t/,$line);

	#check minThreshold
	if ($val > $minThreshold) {
	    print OUT $line;
	}  
    }
    close IN;

    print "Done!\n";
}

utils/datasetCreator/applySemanticFilter.pl  view on Meta::CPAN

# Applies the semantic type filter
sub applySemanticFilter {
    #grab the input
    my $matrixFileName = shift;
    my $outputFileName = shift;
    my $acceptTypesString = shift;
    my $acceptGroupsString = shift;
    my $interfaceConfig = shift;
    my $columnsOnly = shift;

    print STDERR "Applying Semantic Filter to $matrixFileName\n";

    #load the matrix
    my $matrixRef = Discovery::fileToSparseMatrix($matrixFileName);

    #initialize the UMLS::Interface 
    my $componentOptions = 
	LiteratureBasedDiscovery::_readConfigFile('',$interfaceConfig);
    
    my $umls_interface = UMLS::Interface->new($componentOptions) 
	or die "Error: Unable to create UMLS::Interface object.\n";

utils/datasetCreator/combineCooccurrenceMatrices.pl  view on Meta::CPAN

#Check on I/O
    my $outFileName = "$startYear".'_'."$endYear".'_window'."$windowSize";
(!(-e $outFileName)) 
    or die ("ERROR: output file already exists: $outFileName\n");
open OUT, ">$outFileName" 
    or die ("ERROR: unable to open output file: $outFileName\n");

#combine the files
my %matrix = ();
for(my $year = $startYear; $year <= $endYear; $year++) {
    print "reading $year\n";
    my $inFile = $dataFolder.$year.'_window'.$windowSize;
    if (!(open IN, $inFile)) {
	print "   ERROR: unable to open $inFile\n";
	next;
    }

    #read each line of the file and add to the matrix
    while (my $line = <IN>) {
	#read values from the line
	$line =~ /([^\s]+)\t([^\s]+)\t([^\s]+)/;
	my $rowKey = $1;
	my $colKey = $2;
	my $val = $3;

utils/datasetCreator/combineCooccurrenceMatrices.pl  view on Meta::CPAN

	}
	if (!exists ${$matrix{$rowKey}}{$colKey}) {
	    ${$matrix{$rowKey}}{$colKey} = 0;
	}
	${$matrix{$rowKey}}{$colKey}+=$val;
    }
    close IN;
}

#output the matrix
print "outputting the matrix\n";
foreach my $rowKey(keys %matrix) {
    foreach my $colKey(keys %{$matrix{$rowKey}}) {
	print OUT "$rowKey\t$colKey\t${$matrix{$rowKey}}{$colKey}\n";
    }
}
close OUT;
print "DONE!\n";
}





utils/datasetCreator/dataStats/getCUICooccurrences.pl  view on Meta::CPAN


###################################
###################################

#open files
open IN, $inputFile or die("ERROR: unable to open inputFile\n");
open OUT, ">$outputFile" 
    or die ("ERROR: unable to open outputFile: $outputFile\n");


print "Reading File\n";
#count stats for each line of the file
my %ucoCount = (); #a count of the number of unique co-occurrences
my %coCount = (); #a count of the number of co-occurrences
my ($cui1, $cui2, $val);
while (my $line = <IN>) {
    #split the line
    ($cui1, $cui2, $val) = split(/\t/,$line);

    #update the cooccurrence count
    $coCount{$cui1}+=$val;

utils/datasetCreator/dataStats/getCUICooccurrences.pl  view on Meta::CPAN

    #update the unique co-occurrence counts
    $ucoCount{$cui1}++;

    #NOTE: do not update counts for 2, because in the case where order 
    #does not matter, the matrix will have been pre-processed to ensure 
    #the second cui will appear first in the key. In the case where order 
    #does matter we just shouldnt be counting it anyway
}
close IN;

print "Outputting Results\n";
#output the co-occurrence counts, sorted by number of unique
# co-occurrences (descending)
foreach my $cui(sort {$ucoCount{$b}<=>$ucoCount{$a}} keys %ucoCount) {
    #coCount and ucoCount will have the same keys (see above loop)
    print OUT "$cui\t$coCount{$cui}\t$ucoCount{$cui}\n";
}
close OUT;

print "Done!\n";

utils/datasetCreator/dataStats/getMatrixStats.pl  view on Meta::CPAN

# (number of rows, number of columns, number of keys)

&getStats('/home/henryst/lbdData/groupedData/1852_window1_squared_inParts');


#############################################
# gets the stats for the matrix
#############################################
sub getStats {
    my $fileName = shift;
    print STDERR "$fileName\n";

#read in the matrix
    open IN, $fileName or die ("unable to open file: $fileName\n");
    my %matrix = ();
    my $numCooccurrences = 0;
    while (my $line = <IN>) {
	#$line =~ /([^\t]+)\t([^\t]+)\t([\d]+)/;
	$line =~ /([^\s]+)\s([^\s]+)\s([\d]+)/;
	if (!exists $matrix{$1}) {
	    my %hash = ();
	    $matrix{$1} = \%hash;
	}
	$matrix{$1}{$2} = $3;
	$numCooccurrences += $3;
    }
    close IN;

    print STDERR "   num rows in matrix = ".(scalar keys %matrix)."\n";

#count the number of columns and the number of keys
# this is done outside of the loop above because I also need to count the number of columns
    my $numKeys = 0;
    my %colKeys = ();
    foreach my $row (keys %matrix) {
	foreach my $colKey (keys %{$matrix{$row}}) {
	    $colKeys{$colKey} = 1;
	    $numKeys++;
	}
    }

    print STDERR "   num columns in matrix = ".(scalar keys %colKeys)."\n";
    print STDERR "   number of keys in the matrix = $numKeys\n";
    print STDERR "   number of cooccurrences in the matrix = $numCooccurrences\n";
}

utils/datasetCreator/dataStats/metaAnalysis.pl  view on Meta::CPAN

    my $startYear = shift;
    my $endYear = shift;
    my $windowSize = shift;
    my $statsOutFileName= shift;
    my $dataFolder = shift;

    #Check on I/O
    open OUT, ">$statsOutFileName" 
	or die ("ERROR: unable to open stats out file: $statsOutFileName\n");

    #print header row
    print OUT "year\tnumRows\tnumCols\tvocabularySize\tnumCooccurrences\n";

    #get stats for each file and output to file
    for(my $year = $startYear; $year <= $endYear; $year++) {
	print "reading $year\n";
	my $inFile = $dataFolder.$year.'_window'.$windowSize;
	if (open IN, $inFile) {
	    (my $numRows, my $numCols, my $vocabularySize, my $numCooccurrences)
		= &metaAnalysis($inFile);
	    print OUT "$year\t$numRows\t$numCols\t$vocabularySize\t$numCooccurrences\n"	
	}
	else {
	    #just skip the file
	    print "   ERROR: unable to open $inFile\n";
	}
    }
    close OUT;
    print "Done getting stats\n";
}


##############################
# runs meta analysis on a single file
sub metaAnalysis {
    my $fileName = shift;
    
    open IN, $fileName or die ("unable to open file: $fileName\n");
    

utils/datasetCreator/dataStats/metaAnalysis.pl  view on Meta::CPAN

	$uniqueKeys{$1} = 1;
	$uniqueKeys{$2} = 1;
	$numCooccurrences++;
    }
    close IN;

    my $numRows = scalar keys %rowKeys;
    my $numCols = scalar keys %colKeys;
    my $vocabularySize = scalar keys %uniqueKeys;
    
    print "$fileName: $numRows, $numCols, $vocabularySize, $numCooccurrences\n";

    return $numRows, $numCols, $vocabularySize, $numCooccurrences;
}

utils/datasetCreator/fromMySQL/removeQuotes.pl  view on Meta::CPAN


my $inFile = '1980_1984_window1_retest_data.txt';
my $outFile = '1980_1984_window1_restest_DELETEME';


open IN, $inFile or die ("unable to open inFile: $inFile\n");
open OUT, '>'.$outFile or die ("unable to open outFile: $outFile\n");

while (my $line  = <IN>) {
    $line =~ s/"//g;
    #print $line;
    print OUT $line;
}
close IN;
close OUT;

utils/datasetCreator/makeOrderNotMatter.pl  view on Meta::CPAN


#make order not matter
#...output every $outputLimit iterations to avoid too much IO
my %matrix = ();
while (my $line = <IN>) {
    #TODO use split instead of regex match
    $line =~ /([^\s]+)\t([^\s]+)\t([^\s]+)/;
    #$1 = row, $2 = col, $3 = val

    if (!(defined $1) || !(defined $2) || !(defined $3)) {
	print "Not all defined: $line";
    }

    #initialize rows if needed
    if (!(exists $matrix{$1})) {
	my %newHash = ();
	$matrix{$1} = \%newHash;
    }
    if (!(exists $matrix{$2})) {
	my %newHash = ();
	$matrix{$2} = \%newHash;

utils/datasetCreator/makeOrderNotMatter.pl  view on Meta::CPAN


    #add the value
    ${$matrix{$1}}{$2} += $3;
    #${$matrix{$2}}{$1} += $3;
}
close IN;

#output the matrix
foreach my $key1 (keys %matrix) {
    foreach my $key2 (keys %{$matrix{$key1}}) {
	print OUT "$key1\t$key2\t${$matrix{$key1}}{$key2}\n";
    }
}
foreach my $key1 (keys %matrix) {
    foreach my $key2 (keys %{$matrix{$key1}}) {
	print OUT "$key2\t$key1\t${$matrix{$key1}}{$key2}\n";
    }
}
close OUT;

print "DONE!\n";

utils/datasetCreator/removeCUIPair.pl  view on Meta::CPAN

# used to remove Somatomedic C and Arginine from the 1960-1989 datasets
use strict;
use warnings;

my $cuiA = 'C0021665'; #somatomedic c
my $cuiB = 'C0003765'; #arginine
my $matrixFileName = '/home/henryst/lbdData/groupedData/1960_1989_window8_ordered';
my $matrixOutFileName = $matrixFileName.'_removed';
&removeCuiPair($cuiA, $cuiB, $matrixFileName, $matrixOutFileName);

print STDERR "DONE\n";

###########################################
# remove the CUI pair from the dataset
sub removeCuiPair {
    my $cuiA = shift;
    my $cuiB = shift;
    my $matrixFileName = shift;
    my $matrixOutFileName = shift;
    print STDERR "removing $cuiA,$cuiB from $matrixFileName\n";
    
    #open the in and out files
    open IN, $matrixFileName 
	or die ("ERROR: cannot open matrix in file: $matrixFileName\n");
    open OUT, ">$matrixOutFileName" 
	or die ("ERROR: cannot open matrix out file: $matrixOutFileName\n");

    # read in each line of the matrix and copy to the new file
    # but omit any $cuiA,$cuiB or $cuiB,$cuiA lines
    while (my $line = <IN>) {
	if ($line =~ /$cuiA\t$cuiB/ || $line =~ /$cuiB\t$cuiA/) {
	    print "   removing $line";
	    next;
	}
	else {
	    print OUT $line;
	}
    }
}

utils/datasetCreator/removeExplicit.pl  view on Meta::CPAN


###############################
###############################

#removes explicit knowledge ($matrixFileName) from the implicit 
# knowledge ($squaredMatrixFileName)
sub removeExplicit {
    my $matrixFileName = shift;  #the explicit knowledge matrix (usually not filtered)
    my $squaredMatrixFileName = shift;  #the implicit with explicit knowledge matrix (filtered squared)
    my $outputFileName = shift; #the implicit knowledge matrix output file
    print STDERR "Removing Explicit from $matrixFileName\n";

    #read in the matrix
    open IN, $matrixFileName 
	or die("ERROR: unable to open matrix input file: $matrixFileName\n");
    my %matrix = ();
    my $numCooccurrences = 0;
    while (my $line = <IN>) {
	#$line =~ /([^\t]+)\t([^\t]+)\t([\d]+)/;
	$line =~ /([^\s]+)\s([^\s]+)\s([\d]+)/;
	if (!exists $matrix{$1}) {

utils/datasetCreator/removeExplicit.pl  view on Meta::CPAN

    close IN;

    #copy the implicit values of the squared matrix over to a new file 
    open IN, $squaredMatrixFileName 
	or die("ERROR: unable to open squared matrix input file: $squaredMatrixFileName\n");
    open OUT, ">$outputFileName"
	or die("ERROR: unable to open output file: $outputFileName\n");
    while (my $line = <IN>) {
	$line =~ /([^\s]+)\s([^\s]+)\s([\d]+)/;
	if (!exists ${$matrix{$1}}{$2}) {
	    print OUT $line;
	}
    }
    close IN;
    close OUT;

    print STDERR "DONE!\n";
}



utils/datasetCreator/squaring/convertForSquaring_MATLAB.pl  view on Meta::CPAN


########################################
########################################

#converts the matrix to format for squaring in MATLAB
sub convertTo {
    #grab input
    my $inFile = shift;
    my $matrixOutFile = shift;
    my $keyOutFile = shift;
    print STDERR "converting $inFile\n";
  
    #open all the files
    open IN, $inFile
	or die ("ERROR: unable to open inFile: $inFile\n");
    open MATRIX_OUT, ">$matrixOutFile" 
	or die ("ERROR: unable to open matrixOutFile: $matrixOutFile\n");
    open KEY_OUT, ">$keyOutFile"
	or die ("ERROR: unable to open keyOutFile: $keyOutFile\n");

    #convert the infile to the proper format
    print "   outputting matrix\n";
    open IN, $inFile or die ("ERROR unable to reopen inFile: $inFile\n");
    my %keyHash = ();
    my ($cui1,$cui2,$value);
    while (my $line = <IN>) {
	#$line =~ /([^\s]+)\t([^\s]+)\t([^\s]+)/;
	#my $cui1 = $1;
	#my $cui2 = $2;
	#my $value = $3;
	($cui1,$cui2,$value) = split(/\t/,$line);

	if (!exists $keyHash{$cui1}) {
	    $keyHash{$cui1} = (scalar keys %keyHash)+1;
	}
	if (!exists $keyHash{$cui2}) {
	    $keyHash{$cui2} = (scalar keys %keyHash)+1;
	}

	#NOTE: $value has a \n character
	print MATRIX_OUT "$keyHash{$cui1}\t$keyHash{$cui2}\t$value";
    }
    close IN;

    #output the keys file
    print "   Outputting keys\n";
    foreach my $key (sort keys %keyHash) {
	print KEY_OUT "$key\t$keyHash{$key}\n";
    }
    close KEY_OUT;
    print "   DONE!\n";
}

#converts the from format for squaring in MATLAB
sub convertFrom {
    #grab input
    my $matrixInFile = shift;
    my $matrixOutFile = shift;
    my $keyInFile = shift;
    print "converting $matrixInFile\n";
  
    #open all the files
    open IN, $matrixInFile
	or die ("ERROR: unable to open matrixInFile: $matrixInFile\n");
    open MATRIX_OUT, ">$matrixOutFile" 
	or die ("ERROR: unable to open matrixOutFile: $matrixOutFile\n");
    open KEY_IN, $keyInFile
	or die ("ERROR: unable to open keyOutFile: $keyInFile\n");

    #read in all the keys

utils/datasetCreator/squaring/convertForSquaring_MATLAB.pl  view on Meta::CPAN

     }
    close KEY_IN;

    #read in the file and convert on output
    while (my $line = <IN>) {
	$line =~ /([^\s]+)\s([^\s]+)\s([^\s]+)/;
	my $key1 = $1;
	my $key2 = $2;
	my $value = $3;

	print MATRIX_OUT "$keyHash{$key1}\t$keyHash{$key2}\t$value\n";
    }
    close IN;
    close MATRIX_OUT;
    print "   DONE!\n";
}

utils/datasetCreator/squaring/squareMatrix.m  view on Meta::CPAN

    %output the matrix
    [i,j,val] = find(squared);
    clear squared;
    disp('    values grabbed for output');
    data_dump = [i,j,val];
    clear i;
    clear j;
    clear val;
    disp('    values ready for output dump');
    fid = fopen(fileOut,'w');
    fprintf( fid,'%d %d %d\n', transpose(data_dump) );
    fclose(fid);
    disp('   DONE!');

end

utils/datasetCreator/squaring/squareMatrix_partial.m  view on Meta::CPAN

      %output the matrix
      [i,j,val] = find(squared);
      clear squared;
      disp('    values grabbed for output');
      data_dump = [i,j,val];
      clear i;
      clear j;
      clear val;
      disp('    values ready for output dump');
      fid = fopen(fileOut,'a+');
      fprintf( fid,'%d %d %d\n', transpose(data_dump) );
      clear data_dump;
      fclose(fid);
      disp('   values output');
    end
  end
end

utils/datasetCreator/squaring/squareMatrix_perl.pl  view on Meta::CPAN

	    }
	}
	
	#output if needed
	if ($keyCount > $dumpThreshold) {
	    &outputMatrix(\%product, $options{'outputFile'});
	    $keyCount = 0;
	}

    }
    print STDERR "done with row: $count/$total\n";
    $count++;

    
}

#output any other elements in the matrix and finish
&outputMatrix(\%product, $options{'outputFile'});
print STDERR "DONE!\n";




#########################################################
# Helper Functions
#########################################################

sub outputMatrix {
    my $matrixRef = shift;
    my $outputFile = shift;

    #append to the output file
    print STDERR "outputFile = $outputFile\n";
    open OUT, '>>'.$outputFile or die ("ERROR: unable to open output file: $options{outputFile}\n");

    #ouput the matrix
    foreach my $key0 (keys %{$matrixRef}) {  
	foreach my $key1 (keys %{$product{$key0}}) {
	    print OUT "$key0\t$key1\t".${$product{$key0}}{$key1}."\n";
	}
    }
    
    #clear the matrix
    my %newHash = ();
    $matrixRef = \%newHash;

    close OUT;
}

utils/datasetCreator/testMatrixEquality.pl  view on Meta::CPAN


#check that matrix B has all the same elements as matrix A
my $equal = 1;
foreach my $key1 (keys %{$matrixARef}) {
    foreach my $key2 (keys %{${$matrixARef}{$key1}}) {

        #check that it exists in matrix B and that the value is the same
	if (exists ${${$matrixBRef}{$key1}}{$key2}) {
	    if (${${$matrixARef}{$key1}}{$key2} != ${${$matrixBRef}{$key1}}{$key2}) {
		$equal = 0;
		print "A\n";
		last;
	    }
	} else {
	    $equal = 0;
	    print "B\n";
	    last;
	}

	#remove from matrix B
	delete ${${$matrixBRef}{$key1}}{$key2};
    }
    if (!$equal) {
	last;
    }
}

#check the matrix B doesn't contain any elements that aren't in matrix A
if ($equal) {
    foreach my $key1 (keys %{$matrixBRef}) {	
	if (scalar keys %{${$matrixBRef}{$key1}} > 0) {
	    $equal = 0;
	    print "C\n";
	    last;
	}
    }
}

#print the reults
if ($equal) {
    print "Matrices are Equal\n";
} else {
    print "Matrices are NOT Equal\n";
}

print "DONE!\n";

utils/runDiscovery.pl  view on Meta::CPAN

#grab all the options and set values
GetOptions( 'debug'             => \$DEBUG, 
            'help'              => \$HELP,
	    'version'           => \$VERSION,
            'assocConfig=s'     => \$options{'assocConfig'},
            'interfaceConfig=s' => \$options{'interfaceConfig'},
);
 
#Check for version or help
if ($VERSION) {
    print "current version is ".(ALBD->version())."\n";
    exit;
}     
if ($HELP) {
    &showHelp();
    exit;
}    


############################################################################
#                          Begin Running LBD

utils/runDiscovery.pl  view on Meta::CPAN

defined $options{'lbdConfig'} or die ($usage);

my $lbd = ALBD->new(\%options);
$lbd->performLBD();

############################################################################
#  function to output help messages for this program
############################################################################
sub showHelp() {
        
    print "This utility takes an lbd configuration file and outputs\n";
    print "the results of lbd to file. The parameters for LBD are\n";
    print "specified in the input file. Please see samples/lbd or\n";
    print "samples/thresholding for sample input files and descriptions\n";
    print "of parameters and full details on what can be in an LBD input\n";
    print "file.\n";
    
    print "\n";
    print "Usage: runDiscovery.pl LBD_CONFIG_FILE [OPTIONS]\n";
    
    print "\n";
    print "General Options:\n\n";
    print "--help               displays help, a quick summary of program\n"; 
    print "                     options\n";
    print "--assocConfig        path to a UMLS::Association configuration\n";
    print "                     file. Default location is \n";
    print "                     '../config/association'. Replace this file\n";
    print "                     for your computer to avoid having to specify\n";
    print "                     each time.\n";
    print "--interfaceConfig    path to a UMLS::Interface configuration\n";
    print "                     file. Default location is \n";
    print "                     '../config/interface'. Replace this file \n";
    print "                     for your computer to avoid having to specify\n";
    print "                     each time.\n";
    print "--debug              enter debug mode\n";
    print "--version            prints the current version to screen\n";
};



( run in 1.143 second using v1.01-cache-2.11-cpan-de7293f3b23 )