view release on metacpan or search on metacpan
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
Also add information on how to contact you by electronic and paper mail.
If the program is interactive, make it output a short notice like this
when it starts in an interactive mode:
Gnomovision version 69, Copyright (C) year name of author
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.
The hypothetical commands `show w' and `show c' should show the appropriate
parts of the General Public License. Of course, the commands you use may
be called something other than `show w' and `show c'; they could even be
mouse-clicks or menu items--whatever suits your program.
You should also get your employer (if you work as a programmer) or your
school, if any, to sign a "copyright disclaimer" for the program, if
necessary. Here is a sample; alter the names:
Stage 5: Set up Dummy UMLS::Association Database
UMLS::Association requires that a database can be connected to that is
in the correct format. Although this database is not required for ALBD
(since co-occurrence data is loaded from a co-occurrence matrix), it is
required to run UMLS:Association. If you ran UMLS::Association to
generate a co-occurrence matrix, you should be fine. Otherwise you will
need to create a dummy database that it can connect to. This can be done
in a few steps:
1) open mysql type mysql at the terminal
2) create the default database in the correct format, type: CREATE
DATABASE cuicounts; use cuicounts; CREATE TABLE N_11(cui_1 CHAR(10),
cui_2 CHAR(10), n_11 BIGINT(20));
CONTACT US
If you have any trouble installing and using ALBD, please contact us us
directly :
Sam Henry: henryst at vcu.edu
Bridget McInnes: btmcinnes at vcu.edu
ranking methods are available. The novel ranking methods that use
association measure are available as well as frequency based
ranking methods. See samples/lbd for more info. Can perform open and
closed LBD as well as time slicing evaluation.
ALBD requires UMLS::Association both to compute the co-occurrence
database that the co-occurrence matrix is derived from, but also for
ranking the generated C terms.
UMLS::Association requires the UMLS::Interface module to access
the Unified Medical Language System (UMLS) for semantic type filtering
and to determine if CUIs are valid.
The following sections describe the organization of this software
package and how to use it. A few typical examples are given to help
clearly understand the usage of the modules and the supporting
utilities.
INSTALL
To install the module, run the following magic commands:
Set Up Dummy UMLS::Association Database
UMLS::Association requires that a database can be connected to that is
in the correct format. Although this database is not required for ALBD
(since co-occurrence data is loaded from a co-occurrence matrix), it is
required to run UMLS:Association. If you ran UMLS::Association to
generate a co-occurrence matrix, you should be fine. Otherwise you will
need to create a dummy database that it can connect to. This can be done
in a few steps:
1) open mysql type mysql at the terminal
2) create the default database in the correct format, type: CREATE
DATABASE cuicounts; use cuicounts; CREATE TABLE N_11(cui_1 CHAR(10),
cui_2 CHAR(10), n_11 BIGINT(20));
INITIALIZING THE MODULE
To create an instance of the ALBD object, using default values for all
configuration options: %options = (); $options{'lbdConfig'} =
'configFile'; my $lbd = LiteratureBasedDiscovery->new(\%options);
$lbd->performLBD();
The following configuration options are also provided though:
Co-occurrence matrices in the utils/datasetCreator/ directory. These are
short scripts and generally require modifying the code at the top with
user input paramaters specific for each run. These scripts include:
applyMaxThreshold.pl -- applies a maximum co-occurrence threshold to the
co-occurrence matrix
applyMinThreshold.pl -- applies a minimum co-occurrence threshold to the
co-occurrence matrix
applySemanticFilter.pl -- applies a semantic type and/or group filter to
the co-occurrence matrix.
combineCooccurrenceMatrices.pl -- combines the co-occurrence counts of
multiple co-occurrence matrices
makeOrderNotMatter.pl -- makes the order of CUI co-occurrences not
matter by updating the co-occurrence matrix file. (UMLS::Association
generates co-occurrence files where order does matter, so the sentence
'cui1 cui2' will only mark a co-occurrence between cui1 and cui2, but
not between cui2 and cui1).
lib/ALBD.pm view on Meta::CPAN
$start = time();
my $startingMatrixRef =
Discovery::getRows($startCuisRef, $explicitMatrixRef);
print "Got Starting Matrix in ".(time() - $start)."\n";
#if using average minimum weight, grab the a->b scores
my %abPairsWithScores = ();
if ($lbdOptions{'rankingProcedure'} eq 'averageMinimumWeight'
|| $lbdOptions{'rankingProcedure'} eq 'ltc_amw') {
#apply semantic type filter to columns only
if ((scalar keys %{$linkingAcceptTypesRef}) > 0) {
Filters::semanticTypeFilter_columns(
$explicitMatrixRef, $linkingAcceptTypesRef, $umls_interface);
}
#initialize the abPairs to frequency of co-occurrence
foreach my $row (keys %{$startingMatrixRef}) {
foreach my $col (keys %{${$startingMatrixRef}{$row}}) {
$abPairsWithScores{"$row,$col"} = ${${$startingMatrixRef}{$row}}{$col};
}
}
lib/ALBD.pm view on Meta::CPAN
#get association scores for the starting matrix
my $assocScoresRef = TimeSlicing::getAssociationScores(
$startingMatrixRef, $lbdOptions{'rankingMeasure'}, $umls_association);
my ($min, $max) = TimeSlicing::getMinMax($assocScoresRef);
my $range = $max-$min;
#load the post cutoff matrix for the necassary rows
my $postCutoffMatrixRef
= TimeSlicing::loadPostCutOffMatrix($startingMatrixRef, $explicitMatrixRef, $lbdOptions{'postCutoffFileName'});
#apply a semantic type filter to the post cutoff matrix
if ((scalar keys %{$targetAcceptTypesRef}) > 0) {
Filters::semanticTypeFilter_columns(
$postCutoffMatrixRef, $targetAcceptTypesRef, $umls_interface);
}
#apply a threshold at $numIntervals% intervals to generate an 11 point
# interpolated precision/recall curve for linking term ranking/thresholding
#stats for collecting info about predicted vs. true
my $predictedAverage = 0;
my $trueAverage = 0;
lib/ALBD.pm view on Meta::CPAN
#apply a threshold (number of samples)
my $thresholdedStartingMatrixRef = TimeSlicing::grabKHighestRankedSamples($numSamples, $assocScoresRef, $startingMatrixRef);
#generate implicit knowledge
my $implicitMatrixRef = Discovery::findImplicit($explicitMatrixRef, $thresholdedStartingMatrixRef);
#Remove Known Connections
$implicitMatrixRef
= Discovery::removeExplicit($startingMatrixRef, $implicitMatrixRef);
#apply a semantic type filter to the implicit matrix
if ((scalar keys %{$targetAcceptTypesRef}) > 0) {
Filters::semanticTypeFilter_columns(
$implicitMatrixRef, $targetAcceptTypesRef, $umls_interface);
}
#calculate precision and recall
my ($precision, $recall) = TimeSlicing::calculatePrecisionRecall(
$implicitMatrixRef, $postCutoffMatrixRef);
print "precision = $precision, recall = $recall\n";
lib/ALBD.pm view on Meta::CPAN
print "inputting gold\n";
$goldMatrixRef = Discovery::fileToSparseMatrix($lbdOptions{'goldInputFile'});
}
else {
print "loading post cutoff\n";
$goldMatrixRef = TimeSlicing::loadPostCutOffMatrix($startingMatrixRef, $explicitMatrixRef, $lbdOptions{'postCutoffFileName'});
#remove explicit knowledge from the post cutoff matrix
$goldMatrixRef = Discovery::removeExplicit($startingMatrixRef, $goldMatrixRef);
#apply a semantic type filter to the post cutoff matrix
print "applying semantic filter to post-cutoff matrix\n";
if ((scalar keys %{$targetAcceptTypesRef}) > 0) {
Filters::semanticTypeFilter_columns(
$goldMatrixRef, $targetAcceptTypesRef, $umls_interface);
}
#TODO why is the gold matrix outputting with an extra line between samples?
#output the gold matrix
if (exists $lbdOptions{'goldOutputFile'}) {
print "outputting gold\n";
lib/ALBD.pm view on Meta::CPAN
#-------
# AB Scoring (if needed)
#-------
#if using average minimum weight, grab the a->b scores, #TODO this is sloppy here, but it has to be here...how to make it fit better?
my %abPairsWithScores = ();
if ($lbdOptions{'rankingProcedure'} eq 'averageMinimumWeight'
|| $lbdOptions{'rankingProcedure'} eq 'ltc_amw') {
print "getting AB scores\n";
#apply semantic type filter to columns only
if ((scalar keys %{$linkingAcceptTypesRef}) > 0) {
Filters::semanticTypeFilter_columns(
$explicitMatrixRef, $linkingAcceptTypesRef, $umls_interface);
}
#intitialize the abPairs to the frequency of co-ocurrence
foreach my $row (keys %{$startingMatrixRef}) {
foreach my $col (keys %{${$startingMatrixRef}{$row}}) {
$abPairsWithScores{"$row,$col"} = ${${$startingMatrixRef}{$row}}{$col};
}
}
lib/ALBD.pm view on Meta::CPAN
#generate implicit knowledge
print "Squaring Matrix\n";
$predictionsMatrixRef = Discovery::findImplicit(
$explicitMatrixRef, $startingMatrixRef);
#Remove Known Connections
print "Removing Known from Predictions\n";
$predictionsMatrixRef
= Discovery::removeExplicit($startingMatrixRef, $predictionsMatrixRef);
#apply a semantic type filter to the predictions matrix
print "Applying Semantic Filter to Predictions\n";
if ((scalar keys %{$targetAcceptTypesRef}) > 0) {
Filters::semanticTypeFilter_columns(
$predictionsMatrixRef, $targetAcceptTypesRef, $umls_interface);
}
#save the implicit knowledge matrix to file
if (exists ($lbdOptions{'predictionsOutFile'})) {
print "outputting predictions\n";
Discovery::outputMatrixToFile($lbdOptions{'predictionsOutFile'}, $predictionsMatrixRef);
lib/ALBD.pm view on Meta::CPAN
# transforms the string of target cuis to an array
# input: none
# output: an array ref of CUIs
sub _getTargetCuis {
my $self = shift;
my @targetCuis = split(',',$lbdOptions{'targetCuis'});
return \@targetCuis;
}
# transforms the string of accept types or groups into a hash of accept TUIs
# input: a string specifying whether linking or target types are being defined
# output: a hash of acceptable TUIs
sub _getAcceptTypes {
my $self = shift;
my $stepString = shift; #either 'linking' or 'target'
#get the accept types
my %acceptTypes = ();
#add all types for groups specified
my $string = $stepString.'AcceptGroups';
if (defined $lbdOptions{$string}) {
#accept groups were specified
my @acceptGroups = split(',',$lbdOptions{$string});
#add all the types of each group
foreach my $group(@acceptGroups) {
my $typesRef = Filters::getTypesOfGroup($group, $umls_interface);
foreach my $key(keys %{$typesRef}) {
$acceptTypes{$key} = 1;
}
}
}
#add all types specified
$string = $stepString.'AcceptTypes';
if (defined $lbdOptions{$string}) {
#convert each type to a tui and add
my $tui;
my @acceptTypes = split(',',$lbdOptions{$string});
foreach my $abr(@acceptTypes) {
$tui = uc $umls_interface->getStTui($abr);
$acceptTypes{$tui} = 1;
}
}
return \%acceptTypes;
}
lib/LiteratureBasedDiscovery/Filters.pm view on Meta::CPAN
# 59 Temple Place - Suite 330,
# Boston, MA 02111-1307, USA.
package Filters;
use strict;
use warnings;
use UMLS::Interface;
# applies a semantic group filter to the matrix, by removing keys that
# are not allowed semantic type. Eliminates both rows and columns, so
# is applied to the full explicit matrix
# input: $matrixRef <- ref to a sparse matrix to be filtered
# $acceptTypesRef <- a ref to a hash of accept type strings
# $umls <- an instance of UMLS::Interface
# output: None, but $vectorRef is updated
sub semanticTypeFilter_rowsAndColumns {
my $matrixRef = shift;
my $acceptTypesRef = shift;
my $umls = shift;
=comment
#Count the number of keys before and after filtering (for debugging)
my %termsHash = ();
foreach my $key1 (keys %{$matrixRef}) {
foreach my $key2 (keys %{${$matrixRef}{$key1}}) {
$termsHash{$key2} = 1;
}
}
print " number of keys before filtering = ".(scalar keys %termsHash)."\n";
=cut
#eliminate values that are incorrect semantic groups
#do each row at a time, remove column values that
#are the incorrect semantic type
my %cuisChecked = ();
#cuisChecked keeps track of cuis that have been checked
# for elimination. If the cui has been checked its key
# will exist in the hash. Values of -1 indicate it should
# be eliminated, values of 1 indicate it should stay.
#eliminate cuis from rows
foreach my $cui (keys %{$matrixRef}) {
#update cui checked hash
if (!exists $cuisChecked{$cui}) {
$cuisChecked{$cui} = -1;
my $typesRef = $umls->getSt($cui);
foreach my $type(@{$typesRef}) {
my $abr = $umls->getStAbr($type);
#check the cui for removal
if (exists ${$acceptTypesRef}{$type}) {
$cuisChecked{$cui} = 1;
last;
}
}
}
#eliminate if needed
if ($cuisChecked{$cui} < 0) {
delete ${$matrixRef}{$cui};
}
}
#eliminate cuis from columns
foreach my $cui1 (keys %{$matrixRef}) {
foreach my $cui2 (keys %{${$matrixRef}{$cui1}}) {
#update cui checked hash
if (!exists $cuisChecked{$cui2}) {
$cuisChecked{$cui2} = -1;
my $typesRef = $umls->getSt($cui2);
foreach my $type(@{$typesRef}) {
my $abr = $umls->getStAbr($type);
#check the cui for removal
if (exists ${$acceptTypesRef}{$type}) {
$cuisChecked{$cui2} = 1;
last;
}
}
}
#eliminate if needed
if ($cuisChecked{$cui2} < 0) {
delete ${${$matrixRef}{$cui1}}{$cui2};
}
lib/LiteratureBasedDiscovery/Filters.pm view on Meta::CPAN
foreach my $key2 (keys %{${$matrixRef}{$key1}}) {
$termsHash{$key2} = 1;
}
}
print " number of keys after filtering = ".(scalar keys %termsHash)."\n";
=cut
}
# applies a semantic group filter to the matrix, by removing keys that
# are not allowed semantic type. Only removes types from rows,
# so is applied for times slicing, before randomly selecting terms of
# one semantic type
# input: $matrixRef <- ref to a sparse matrix to be filtered
# $acceptTypesRef <- a ref to a hash of accept type strings
# $umls <- an instance of UMLS::Interface
# output: None, but $vectorRef is updated
sub semanticTypeFilter_rows {
my $matrixRef = shift;
my $acceptTypesRef = shift;
my $umls = shift;
=comment
#Count the number of keys before and after filtering (for debugging)
my %termsHash = ();
foreach my $key1 (keys %{$matrixRef}) {
foreach my $key2 (keys %{${$matrixRef}{$key1}}) {
$termsHash{$key2} = 1;
}
}
print " number of keys before filtering = ".(scalar keys %termsHash)."\n";
=cut
#eliminate values that are incorrect semantic groups
#do each row at a time, remove column values that
#are the incorrect semantic type
my $keep = -1;
#cuisChecked keeps track of cuis that have been checked
# for elimination. If the cui has been checked its key
# will exist in the hash. Values of -1 indicate it should
# be eliminated, values of 1 indicate it should stay.
#eliminate cuis from columns
foreach my $cui1 (keys %{$matrixRef}) {
my $typesRef = $umls->getSt($cui1);
foreach my $type(@{$typesRef}) {
my $abr = $umls->getStAbr($type);
#check the cui for removal
if (exists ${$acceptTypesRef}{$type}) {
$keep = 1;
last;
}
}
#eliminate if needed
if ($keep < 0) {
delete ${$matrixRef}{$cui1};
}
$keep = -1;
lib/LiteratureBasedDiscovery/Filters.pm view on Meta::CPAN
foreach my $key2 (keys %{${$matrixRef}{$key1}}) {
$termsHash{$key2} = 1;
}
}
print " number of keys after filtering = ".(scalar keys %termsHash)."\n";
=cut
}
# applies a semantic group filter to the matrix, by removing keys that
# are not allowed semantic type. Only removes types from columns,
# so is applied to the implicit matrix (starting term rows with implicit
# columns).
# input: $matrixRef <- ref to a sparse matrix to be filtered
# $acceptTypesRef <- a ref to a hash of accept type strings
# $umls <- an instance of UMLS::Interface
# output: None, but $vectorRef is updated
sub semanticTypeFilter_columns {
my $matrixRef = shift;
my $acceptTypesRef = shift;
my $umls = shift;
=comment
#Count the number of keys before and after filtering (for debugging)
my %termsHash = ();
foreach my $key1 (keys %{$matrixRef}) {
foreach my $key2 (keys %{${$matrixRef}{$key1}}) {
$termsHash{$key2} = 1;
}
}
print " number of keys before filtering = ".(scalar keys %termsHash)."\n";
=cut
#eliminate values that are incorrect semantic groups
#do each row at a time, remove column values that
#are the incorrect semantic type
my %cuisChecked = ();
#cuisChecked keeps track of cuis that have been checked
# for elimination. If the cui has been checked its key
# will exist in the hash. Values of -1 indicate it should
# be eliminated, values of 1 indicate it should stay.
#eliminate cuis from columns
foreach my $cui1 (keys %{$matrixRef}) {
foreach my $cui2 (keys %{${$matrixRef}{$cui1}}) {
#update cui checked hash
if (!exists $cuisChecked{$cui2}) {
$cuisChecked{$cui2} = -1;
my $typesRef = $umls->getSt($cui2);
foreach my $type(@{$typesRef}) {
my $abr = $umls->getStAbr($type);
#check the cui for removal
if (exists ${$acceptTypesRef}{$type}) {
$cuisChecked{$cui2} = 1;
last;
}
}
}
#eliminate if needed
if ($cuisChecked{$cui2} < 0) {
delete ${${$matrixRef}{$cui1}}{$cui2};
}
lib/LiteratureBasedDiscovery/Filters.pm view on Meta::CPAN
foreach my $key1 (keys %{$matrixRef}) {
foreach my $key2 (keys %{${$matrixRef}{$key1}}) {
$termsHash{$key2} = 1;
}
}
print " number of keys after filtering = ".(scalar keys %termsHash)."\n";
=cut
}
# gets the semantic types of the group
# input: $group <- a string specifying a semantic group
# $umls <- an instance of UMLS::Interface
# output: a ref to a hash of TUIs
sub getTypesOfGroup {
my $group = shift;
my $umls = shift;
#add each type of the group to the set of accept types
my %acceptTuis = ();
my @groupTypes = @{ $umls->getStsFromSg($group) };
foreach my $abr(@groupTypes) {
#check that it is defined (types that are no longer in
#the UMLS may be returned as part of the group)
if (defined $abr) {
my $tui = uc $umls->getStTui($abr);
$acceptTuis{$tui} = 1;
}
}
return \%acceptTuis;
}
# gets all semantic types of the UMLS
# input: $umls <- an instance of UMLS::Interface
# output: a ref to an array of TUIs
sub getAllTypes {
my $umls = shift;
my $abrRef = $umls->getAllSts();
my @tuis = ();
foreach my $abr(@{$abrRef}) {
push @tuis, uc $umls->getStTui($abr);
}
lib/LiteratureBasedDiscovery/TimeSlicing.pm view on Meta::CPAN
#return the post cutoff matrix
return \%postCutoffMatrix;
}
#TODO numRows should be read from file and sent with the lbdOptionsRef
# generates a starting matrix of numRows randomly selected terms
# input: $explicitMatrixRef <- a ref to the explicit sparse matrix
# $lbdOptionsRef <- the LBD options
# $startTermAcceptTypesRef <- a reference to an hash of accept
# types for start terms (TUIs)
# $numRows <- the number of random rows to load (if random)
# $umls_interface <- an instance of the UMLS::Interface
# output: \%startingMatrix <- a ref to the starting sparse matrix
sub generateStartingMatrix {
my $explicitMatrixRef = shift;
my $lbdOptionsRef = shift;
my $startTermAcceptTypesRef = shift;
my $numRows = shift;
my $umls_interface = shift;
lib/LiteratureBasedDiscovery/TimeSlicing.pm view on Meta::CPAN
}
close OUT;
}
}
#return the starting matrix
return \%startingMatrix;
}
# gets and returns a hash of row keys of the specifies semantic types
# input: $matrixRef <- a ref to a sparse matrix
# $acceptTypesRef <- a ref to a hash of accept types (TUIs)
# $umls <- an instance of UMLS::Interface
# output: \%rowsToKeep <- a ref to hash of rows to keep, each key is
# a CUI, and values are 1. All CUIs specify rows
# of acceptable semantic types
sub getRowsOfSemanticTypes {
my $matrixRef = shift;
my $acceptTypesRef = shift;
my $umls = shift;
#loop through the matrix and keep the rows that are of the
# desired semantic types
my %rowsToKeep = ();
foreach my $cui1 (keys %{$matrixRef}) {
my $typesRef = $umls->getSt($cui1);
foreach my $type(@{$typesRef}) {
my $abr = $umls->getStAbr($type);
#check the cui for removal
if (exists ${$acceptTypesRef}{$type}) {
$rowsToKeep{$cui1} = 1;
last;
}
}
}
#return the rowsToKeep
return \%rowsToKeep
}
samples/lbdConfig view on Meta::CPAN
# The output path of the results of lbd
<implicitOutputFile>sampleOutput
# a comma seperated list of linking (B) term accept semantic groups, which
# restricts the linking terms to the semantic groups specified. Group names
# come directly from the UMLS.
# See https://metamap.nlm.nih.gov/Docs/SemGroups_2013.txt for a list
#<linkingAcceptGroups>CHEM,DISO,GENE,PHYS,ANAT
# similar to linking accept groups, this restricts the acceptable linking (B)
# terms to terms within the semantic types listed
# See http://metampa.nlm.gov/Docs/SemanticTypes_2013AA.txt for a list
#<linkingAcceptGroups>clnd,chem
# a comma seperated list of target (C) term accept semantic groups, which
# restricts the linking terms to the semantic groups specified. Group names
# come directly from the UMLS.
# See https://metamap.nlm.nih.gov/Docs/SemGroups_2013.txt for a list
#<targetAcceptGroups>CHEM,GENE
# similar to target termcept groups, this restricts the acceptable target (C)
# terms to terms within the semantic types listed
# See http://metampa.nlm.gov/Docs/SemanticTypes_2013AA.txt for a list
#<linkingAcceptGroups>clnd,chem
# Input file path for the explicit co-occurrence matrix used in LBD
<explicitInputFile>sampleExplicitMatrix
# A comma seperated list of starting (A) cuis used in LBD
<startCuis>C0001554,C1961131
# A comma seperated list of target (C) cuis. If specified, system enters closed
samples/timeSlicingConfig view on Meta::CPAN
#----- Time Slicing Specific Parameters ------------------------
#Tell LBD to enter precision and recall mode (time slicing)
<precisionAndRecall_implicit>
# name of the file that contains a new line seperated list of cuis
# each cui serves as a start term. The average of all cuis in this
# file are what is reported for precision and recall
<cuiListFileName>timeSliceCuiList
# A list of starting accept types. This is used to randomly generate 100
# starting terms if a cuiListFileName is not specified. All starting terms
# will be of the types listed
<startAcceptTypes>dsyn
#--------------------------------------
# The ranking procedure to use for LBD
# valid ranking procedures are:
# allPairs (maxBC) - maximum B to C term value
# averageMinimumWeight (AMW) - average of minimum A to B and B to C values
# linkingTermCount* (LTC) - count of shared linking terms
samples/timeSlicingConfig view on Meta::CPAN
# tscore - T-score
<rankingMeasure>ll
# a comma seperated list of linking (B) term accept semantic groups, which
# restricts the linking terms to the semantic groups specified. Group names
# come directly from the UMLS.
# See https://metamap.nlm.nih.gov/Docs/SemGroups_2013.txt for a list
#<linkingAcceptGroups>CHEM,DISO,GENE,PHYS,ANAT
# similar to linking accept groups, this restricts the acceptable linking (B)
# terms to terms within the semantic types listed
# See http://metampa.nlm.gov/Docs/SemanticTypes_2013AA.txt for a list
#<linkingAcceptGroups>clnd,chem
# a comma seperated list of target (C) term accept semantic groups, which
# restricts the linking terms to the semantic groups specified. Group names
# come directly from the UMLS.
# See https://metamap.nlm.nih.gov/Docs/SemGroups_2013.txt for a list
#<targetAcceptGroups>CHEM,GENE
# similar to target termcept groups, this restricts the acceptable target (C)
# terms to terms within the semantic types listed
# See http://metampa.nlm.gov/Docs/SemanticTypes_2013AA.txt for a list
#<linkingAcceptGroups>clnd,chem
# Input file path for the explicit co-occurrence matrix used in LBD
<explicitInputFile>sampleExplicitMatrix
# Input file path for the gold standard matrix (matrix of true predictions)
# See utils/datasetCreator on how to make this
<goldInputFile>sampleGoldMatrix
utils/datasetCreator/applySemanticFilter.pl view on Meta::CPAN
my $columnsOnly = 0; #apply to columns only, or rows and columns
&applySemanticFilter($matrixFileName, $outputFileName,
$acceptTypesString, $acceptGroupsString,
###################################################################
###################################################################
# Applies the semantic type filter
sub applySemanticFilter {
#grab the input
my $matrixFileName = shift;
my $outputFileName = shift;
my $acceptTypesString = shift;
my $acceptGroupsString = shift;
my $interfaceConfig = shift;
my $columnsOnly = shift;
print STDERR "Applying Semantic Filter to $matrixFileName\n";
utils/datasetCreator/applySemanticFilter.pl view on Meta::CPAN
#output the matrix
Discovery::outputMatrixToFile($outputFileName, $matrixRef);
#TODO re-enable this and then try to run again
#disconnect from the database and return
#$umls_interface->disconnect();
}
# transforms the string of accept types or groups into a hash of accept TUIs
# input: a string specifying whether linking or target types are being defined
# output: a hash of acceptable TUIs
sub getAcceptTypes {
my $umls_interface = shift;
my $acceptTypesString = shift;
my $acceptGroupsString = shift;
#get the accept types
my %acceptTypes = ();
#add all types for groups specified
#accept groups were specified
my @acceptGroups = split(',',$acceptGroupsString);
#add all the types of each group
foreach my $group(@acceptGroups) {
my $typesRef = Filters::getTypesOfGroup($group, $umls_interface);
foreach my $key(keys %{$typesRef}) {
$acceptTypes{$key} = 1;
}
}
#add all types specified
#convert each type to a tui and add
my $tui;
my @acceptTypes = split(',',$acceptTypesString);
foreach my $abr(@acceptTypes) {
$tui = uc $umls_interface->getStTui($abr);
$acceptTypes{$tui} = 1;
}
return \%acceptTypes;
}