view release on metacpan or search on metacpan
work under the conditions stated herein. The "Document", below,
refers to any such manual or work. Any member of the public is a
licensee, and is addressed as "you". You accept the license if you
copy, modify or distribute the work in a way requiring permission
under copyright law.
A "Modified Version" of the Document means any work containing the
Document or a portion of it, either copied verbatim, or with
modifications and/or translated into another language.
A "Secondary Section" is a named appendix or a front-matter section of
the Document that deals exclusively with the relationship of the
publishers or authors of the Document to the Document's overall subject
(or to related matters) and contains nothing that could fall directly
within that overall subject. (Thus, if the Document is in part a
textbook of mathematics, a Secondary Section may not explain any
mathematics.) The relationship could be a matter of historical
connection with the subject or with related matters, or of legal,
commercial, philosophical, ethical or political position regarding
them.
The "Invariant Sections" are certain Secondary Sections whose titles
are designated, as being those of Invariant Sections, in the notice
that says that the Document is released under this License. If a
section does not fit the above definition of Secondary then it is not
allowed to be designated as Invariant. The Document may contain zero
Invariant Sections. If the Document does not identify any Invariant
Sections then there are none.
The "Cover Texts" are certain short passages of text that are listed,
as Front-Cover Texts or Back-Cover Texts, in the notice that says that
the Document is released under this License. A Front-Cover Text may
be at most 5 words, and a Back-Cover Text may be at most 25 words.
A "Transparent" copy of the Document means a machine-readable copy,
machine-generated HTML, PostScript or PDF produced by some word
processors for output purposes only.
The "Title Page" means, for a printed book, the title page itself,
plus such following pages as are needed to hold, legibly, the material
this License requires to appear in the title page. For works in
formats which do not have any title page as such, "Title Page" means
the text near the most prominent appearance of the work's title,
preceding the beginning of the body of the text.
A section "Entitled XYZ" means a named subunit of the Document whose
title either is precisely XYZ or contains XYZ in parentheses following
text that translates XYZ in another language. (Here XYZ stands for a
specific section name mentioned below, such as "Acknowledgements",
"Dedications", "Endorsements", or "History".) To "Preserve the Title"
of such a section when you modify the Document means that it remains a
section "Entitled XYZ" according to this definition.
The Document may include Warranty Disclaimers next to the notice which
states that this License applies to the Document. These Warranty
Disclaimers are considered to be included by reference in this
License, but only as regards disclaiming warranties: any other
implication that these Warranty Disclaimers may have is void and has
no effect on the meaning of this License.
2. VERBATIM COPYING
You may copy and distribute the Document in any medium, either
commercially or noncommercially, provided that this License, the
copyright notices, and the license notice saying this License applies
to the Document are reproduced in all copies, and that you add no other
conditions whatsoever to those of this License. You may not use
technical measures to obstruct or control the reading or further
copying of the copies you make or distribute. However, you may accept
compensation in exchange for copies. If you distribute a large enough
number of copies you must also follow the conditions in section 3.
You may also lend copies, under the same conditions stated above, and
you may publicly display copies.
3. COPYING IN QUANTITY
If you publish printed copies (or copies in media that commonly have
printed covers) of the Document, numbering more than 100, and the
Document's license notice requires Cover Texts, you must enclose the
edition to the public.
It is requested, but not required, that you contact the authors of the
Document well before redistributing any large number of copies, to give
them a chance to provide you with an updated version of the Document.
4. MODIFICATIONS
You may copy and distribute a Modified Version of the Document under
the conditions of sections 2 and 3 above, provided that you release
the Modified Version under precisely this License, with the Modified
Version filling the role of the Document, thus licensing distribution
and modification of the Modified Version to whoever possesses a copy
of it. In addition, you must do these things in the Modified Version:
A. Use in the Title Page (and on the covers, if any) a title distinct
from that of the Document, and from those of previous versions
(which should, if there were any, be listed in the History section
of the Document). You may use the same title as a previous version
if the original publisher of that version gives permission.
B. List on the Title Page, as authors, one or more persons or entities
responsible for authorship of the modifications in the Modified
Version, together with at least five of the principal authors of the
Document (all of its principal authors, if it has fewer than five),
unless they release you from this requirement.
C. State on the Title page the name of the publisher of the
Modified Version, as the publisher.
D. Preserve all the copyright notices of the Document.
E. Add an appropriate copyright notice for your modifications
adjacent to the other copyright notices.
F. Include, immediately after the copyright notices, a license notice
giving the public permission to use the Modified Version under the
terms of this License, in the form shown in the Addendum below.
G. Preserve in that license notice the full lists of Invariant Sections
and required Cover Texts given in the Document's license notice.
H. Include an unaltered copy of this License.
I. Preserve the section Entitled "History", Preserve its Title, and add
to it an item stating at least the title, year, new authors, and
publisher of the Modified Version as given on the Title Page. If
there is no section Entitled "History" in the Document, create one
stating the title, year, authors, and publisher of the Document as
given on its Title Page, then add an item describing the Modified
Version as stated in the previous sentence.
J. Preserve the network location, if any, given in the Document for
public access to a Transparent copy of the Document, and likewise
the network locations given in the Document for previous versions
it was based on. These may be placed in the "History" section.
You may omit a network location for a work that was published at
least four years before the Document itself, or if the original
publisher of the version it refers to gives permission.
K. For any section Entitled "Acknowledgements" or "Dedications",
Preserve the Title of the section, and preserve in the section all
the substance and tone of each of the contributor acknowledgements
and/or dedications given therein.
L. Preserve all the Invariant Sections of the Document,
unaltered in their text and in their titles. Section numbers
or the equivalent are not considered part of the section titles.
M. Delete any section Entitled "Endorsements". Such a section
may not be included in the Modified Version.
N. Do not retitle any existing section to be Entitled "Endorsements"
or to conflict in title with any Invariant Section.
O. Preserve any Warranty Disclaimers.
If the Modified Version includes new front-matter sections or
appendices that qualify as Secondary Sections and contain no material
copied from the Document, you may at your option designate some or all
of these sections as invariant. To do this, add their titles to the
list of Invariant Sections in the Modified Version's license notice.
These titles must be distinct from any other section titles.
You may add a section Entitled "Endorsements", provided it contains
nothing but endorsements of your Modified Version by various
parties--for example, statements of peer review or that the text has
been approved by an organization as the authoritative definition of a
standard.
You may add a passage of up to five words as a Front-Cover Text, and a
passage of up to 25 words as a Back-Cover Text, to the end of the list
of Cover Texts in the Modified Version. Only one passage of
Front-Cover Text and one of Back-Cover Text may be added by (or
through arrangements made by) any one entity. If the Document already
permission from the previous publisher that added the old one.
The author(s) and publisher(s) of the Document do not by this License
give permission to use their names for publicity for or to assert or
imply endorsement of any Modified Version.
5. COMBINING DOCUMENTS
You may combine the Document with other documents released under this
License, under the terms defined in section 4 above for modified
versions, provided that you include in the combination all of the
Invariant Sections of all of the original documents, unmodified, and
list them all as Invariant Sections of your combined work in its
license notice, and that you preserve all their Warranty Disclaimers.
The combined work need only contain one copy of this License, and
multiple identical Invariant Sections may be replaced with a single
copy. If there are multiple Invariant Sections with the same name but
different contents, make the title of each such section unique by
adding at the end of it, in parentheses, the name of the original
author or publisher of that section if known, or else a unique number.
Make the same adjustment to the section titles in the list of
Invariant Sections in the license notice of the combined work.
In the combination, you must combine any sections Entitled "History"
in the various original documents, forming one section Entitled
"History"; likewise combine any sections Entitled "Acknowledgements",
and any sections Entitled "Dedications". You must delete all sections
Entitled "Endorsements".
6. COLLECTIONS OF DOCUMENTS
You may make a collection consisting of the Document and other documents
released under this License, and replace the individual copies of this
License in the various documents with a single copy that is included in
the collection, provided that you follow the rules of this License for
verbatim copying of each of the documents in all other respects.
A compilation of the Document or its derivatives with other separate
and independent documents or works, in or on a volume of a storage or
distribution medium, is called an "aggregate" if the copyright
resulting from the compilation is not used to limit the legal rights
of the compilation's users beyond what the individual works permit.
When the Document is included in an aggregate, this License does not
apply to the other works in the aggregate which are not themselves
derivative works of the Document.
If the Cover Text requirement of section 3 is applicable to these
copies of the Document, then if the Document is less than one half of
the entire aggregate, the Document's Cover Texts may be placed on
covers that bracket the Document within the aggregate, or the
electronic equivalent of covers if the Document is in electronic form.
Otherwise they must appear on printed covers that bracket the whole
aggregate.
8. TRANSLATION
Translation is considered a kind of modification, so you may
distribute translations of the Document under the terms of section 4.
Replacing Invariant Sections with translations requires special
permission from their copyright holders, but you may include
translations of some or all Invariant Sections in addition to the
original versions of these Invariant Sections. You may include a
translation of this License, and all the license notices in the
Document, and any Warranty Disclaimers, provided that you also include
the original English version of this License and the original versions
of those notices and disclaimers. In case of a disagreement between
the translation and the original version of this License or a notice
or disclaimer, the original version will prevail.
If a section in the Document is Entitled "Acknowledgements",
"Dedications", or "History", the requirement (section 4) to Preserve
its Title (section 1) will typically require changing the actual
title.
9. TERMINATION
You may not copy, modify, sublicense, or distribute the Document except
as expressly provided for under this License. Any other attempt to
copy, modify, sublicense or distribute the Document is void, and will
automatically terminate your rights under this License. However,
parties who have received copies, or rights, from you under this
the License in the document and put the following copyright and
license notices just after the title page:
Copyright (c) YEAR YOUR NAME.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.2
or any later version published by the Free Software Foundation;
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
Texts.
A copy of the license is included in the section entitled "GNU
Free Documentation License".
If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts,
replace the "with...Texts." line with this:
with the Invariant Sections being LIST THEIR TITLES, with the
Front-Cover Texts being LIST, and with the Back-Cover Texts being
LIST.
If you have Invariant Sections without Cover Texts, or some other
interactive use in the most ordinary way, to print or display an
announcement including an appropriate copyright notice and a
notice that there is no warranty (or else, saying that you provide
a warranty) and that users may redistribute the program under
these conditions, and telling the user how to view a copy of this
License. (Exception: if the Program itself is interactive but
does not normally print such an announcement, your work based on
the Program is not required to print an announcement.)
These requirements apply to the modified work as a whole. If
identifiable sections of that work are not derived from the Program,
and can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works. But when you
distribute the same sections as part of a whole which is a work based
on the Program, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the
entire whole, and thus to each and every part regardless of who wrote it.
Thus, it is not the intent of this section to claim rights or contest
your rights to work written entirely by you; rather, the intent is to
exercise the right to control the distribution of derivative or
collective works based on the Program.
In addition, mere aggregation of another work not based on the Program
with the Program (or with a work based on the Program) on a volume of
a storage or distribution medium does not bring the other work under
the scope of this License.
3. You may copy and distribute the Program (or a work based on it,
years, to give any third party, for a charge no more than your
cost of physically performing source distribution, a complete
machine-readable copy of the corresponding source code, to be
distributed under the terms of Sections 1 and 2 above on a medium
customarily used for software interchange; or,
c) Accompany it with the information you received as to the offer
to distribute corresponding source code. (This alternative is
allowed only for noncommercial distribution and only if you
received the program in object code or executable form with such
an offer, in accord with Subsection b above.)
The source code for a work means the preferred form of the work for
making modifications to it. For an executable work, complete source
code means all the source code for all modules it contains, plus any
associated interface definition files, plus the scripts used to
control compilation and installation of the executable. However, as a
special exception, the source code distributed need not include
anything that is normally distributed (in either source or binary
form) with the major components (compiler, kernel, and so on) of the
operating system on which the executable runs, unless that component
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot
distribute so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you
may not distribute the Program at all. For example, if a patent
license would not permit royalty-free redistribution of the Program by
all those who receive copies directly or indirectly through you, then
the only way you could satisfy both it and this License would be to
refrain entirely from distribution of the Program.
If any portion of this section is held invalid or unenforceable under
any particular circumstance, the balance of the section is intended to
apply and the section as a whole is intended to apply in other
circumstances.
It is not the purpose of this section to induce you to infringe any
patents or other property right claims or to contest validity of any
such claims; this section has the sole purpose of protecting the
integrity of the free software distribution system, which is
implemented by public license practices. Many people have made
generous contributions to the wide range of software distributed
through that system in reliance on consistent application of that
system; it is up to the author/donor to decide if he or she is willing
to distribute software through any other system and a licensee cannot
impose that choice.
This section is intended to make thoroughly clear what is believed to
be a consequence of the rest of this License.
8. If the distribution and/or use of the Program is restricted in
certain countries either by patents or by copyrighted interfaces, the
original copyright holder who places the Program under this License
may add an explicit geographical distribution limitation excluding
those countries, so that distribution is permitted only in or among
countries not thus excluded. In such case, this License incorporates
the limitation as if written in the body of this License.
perl Makefile.PL PREFIX=/home
or
perl Makefile.PL LIB=/home/lib PREFIX=/home
will install UMLS-Interface into /home. The first method above will
install the modules in /home/lib/perl5/site_perl/5.16.3 (assuming you
are using version 5.16.3 of Perl; otherwise, the directory will be
slightly different). The second method will install the modules in
/home/lib. In either case the executable scripts will be installed in
/home/bin and the man pages will be installed in home/share.
Warning: do not put a dash or hyphen in front of PREFIX, or LIB
In your perl programs that you may write using the modules, you may need
to add a line like so
use lib '/home/lib/perl5/site_perl/5.16.3';
if you used the first method or
use lib '/home/lib';
if you used the second method. By doing this, the installed modules are
found by your program. To run the runDiscovery.pl program, you would
need to do
perl -I/home/lib/perl5/site_perl/5.16.3 runDiscovery.pl
or
perl -I/home/lib
Of course, you could also add the 'use lib' line to the top of the
closed LBD as well as time slicing evaluation.
ALBD requires UMLS::Association both to compute the co-occurrence
database that the co-occurrence matrix is derived from, but also for
ranking the generated C terms.
UMLS::Association requires the UMLS::Interface module to access
the Unified Medical Language System (UMLS) for semantic type filtering
and to determine if CUIs are valid.
The following sections describe the organization of this software
package and how to use it. A few typical examples are given to help
clearly understand the usage of the modules and the supporting
utilities.
INSTALL
To install the module, run the following magic commands:
perl Makefile.PL
make
make test
There is another folder containing scripts to square co-occurrence
matrices. Squaring an explicit (A to B) co-occurrence matrix results in
a co-occurrence matrix containing all implicit (A to C) connections.
This is useful for time slicing and other analysis. Removal of the
original explicit matrix is an additional step that is required if you
wish to create a predictions matrix file for every CUI. This can be done
with the removeExplicit.pl script. Squaring a co-occurrence matrix can
be very computationally expensive, both in terms of ram and cpu. For
this reason MATLAB scripts are preferred over perl scripts. Even using
MATLAB ram can become an issue, and squaring sections of a matrix and
combining them into a single output matrix may be necassary, but takes
much longer. Scripts in the squaring folder include:
convertForSquaring_MATLAB.pl -- functions to convert to and from ALBD
and MATLAB sparse matrix formats
squareMatrix.m -- MATLAB script to square a matrix while holding
everything in ram. Faster, but requires more ram.
squareMatrix_partial.m -- MATLAB script to square a matrix in chunks.
lib/ALBD.pm view on Meta::CPAN
# throughtout LBD.
#
# Matrix representation:
# This module uses a matrix representation for LBD. All operations are
# performed either as matrix or vector operations. The core data structure
# are the co-occurrence matrices explicitMatrix and implicitMatrix. These
# matrices have dimensions vocabulary size by vocabulary size. Each row
# corresponds to the all co-occurrences for a single CUI. Each column of that
# row corresponding to a co-occurrence with a single CUI. Since the matrices
# tend to be sparse, they are stored as hashes of hashes, where the the first
# key is for a row, and the second key is for a column. The keys of each hash
# are the indeces within the matrix. The hash values are the number of
# co-ocurrences for that CUI pair (e.g. ${${$explicit{C0000000}}{C1111111} = 10
# means that CUI C0000000 and C1111111 co-occurred 10 times).
#
# Now with an understanding of the data strucutres, below is a breif
# description of each:
#
# startingMatrix <- A matrix containing the explicit matrix rows for all of the
# start terms. This makes it easy to have multiple start terms
# and using this matrix as opposed to the entire explicit
lib/LiteratureBasedDiscovery/Rank.pm view on Meta::CPAN
#get cui pair scores
&getBatchAssociationScores(
$bcScoresRef, $explicitMatrixRef, $measure, $association,
$n1pRef, $np1Ref, $npp);
#find the max a->b score (since there can be multiple a terms)
my %maxABScores = ();
my ($key1, $key2, $score);
foreach my $pairKey (keys %{$abScoresRef}) {
#second value is b term
($key1, $key2) = split(/,/,$pairKey);
$score = ${$abScoresRef}{$pairKey};
if ($score != -1) { #only compute for associations that exist
if (exists $maxABScores{$key2}) {
if ($score > $maxABScores{$key2}) {
$maxABScores{$key2} = $score;
}
} else {
$maxABScores{$key2} = $score;
lib/LiteratureBasedDiscovery/Rank.pm view on Meta::CPAN
# Find the average minimum weight (cScores) for each c term
# average of minimum a->b score and b->c score
my %cScores = ();
my %counts = ();
my ($value, $count, $min, $bTerm, $cTerm);
#sum min scores
foreach my $pairKey (keys %{$bcScoresRef}) {
#only compute for scores that exist
if (${$bcScoresRef}{$pairKey} != -1) {
#first is bTerm, second is cTerm
($bTerm, $cTerm) = split(/,/,$pairKey);
#check there is an AB value
if ($maxABScores{$bTerm} != -1) {
#get the minimum between a->b and b->c
$min = ${$bcScoresRef}{$pairKey};
if ($maxABScores{$bTerm} < $min) {
$min = $maxABScores{$bTerm};
}
lib/LiteratureBasedDiscovery/Rank.pm view on Meta::CPAN
my $explicitMatrixRef = shift;
my $implicitMatrixRef = shift;
#get all bc pairs
my $bcPairsRef = &_getBCPairs($startingMatrixRef, $explicitMatrixRef, $implicitMatrixRef);
# Find the linking term count for each cTerm
my %scores = ();
my ($key1, $key2);
foreach my $pairKey (keys %{$bcPairsRef}) {
#cTerm is the second value ($key2)
($key1, $key2) = split(/,/,$pairKey);
#automatically initializes to 0
$scores{$key2}++;
}
return \%scores;
}
# scores each implicit CUI based on the summed frequency of co-occurrence
lib/LiteratureBasedDiscovery/Rank.pm view on Meta::CPAN
my $explicitMatrixRef = shift;
my $implicitMatrixRef = shift;
#get all bc pairs
my $bcPairsRef = &_getBCPairs($startingMatrixRef, $explicitMatrixRef, $implicitMatrixRef);
# Find the frequency count for each cTerm
my %scores = ();
my ($key1, $key2);
foreach my $pairKey (keys %{$bcPairsRef}) {
#cTerm is the second value ($key2)
($key1, $key2) = split(/,/,$pairKey);
#automatically initializes to 0 (with +=)
$scores{$key2} += ${$bcPairsRef}{$pairKey};
}
return \%scores;
}
# scores each implicit CUI using an assocation measure. Score is the maximum
# association between a column in the implicit matrix, and one of the start
lib/LiteratureBasedDiscovery/TimeSlicing.pm view on Meta::CPAN
return \%cuiPairs;
} else {
#updates values in cuiPairs hash with their association scores and returns
Rank::getBatchAssociationScores(\%cuiPairs, $matrixRef, $rankingMeasure, $umls_association);
return \%cuiPairs;
}
}
# gets the min and max value of a hash
# returns a two element array, where the first value is the min, and
# the second values is the max
# input: $hashref <- a reference to a hash with numbers as values
# output: ($min, $max) <- the minimum and maximum values in the hash
sub getMinMax {
my $hashRef = shift;
#loop through each key and record the min/max
my $min = 999999;
my $max = -999999;
foreach my $key (keys %{$hashRef}) {
my $val = ${$hashRef}{$key};
utils/datasetCreator/applyMaxThreshold.pl view on Meta::CPAN
$count{$cui1}++;
}
else {
#update the cooccurrence count
chomp $val; #I'm not sure if this is necassary...probably not
$count{$cui1}+=$val;
}
#NOTE: do not update counts for $2, because in the case where order
#does not matter, the matrix will have been pre-processed to ensure
#the second cui will appear first in the key. In the case where order
#does matter we just shouldnt be counting it anyway
}
close IN;
return \%count;
}
#applies a maxThreshold, $countRef is the output of getStats
sub applyMaxThreshold {
my $inputFile = shift;
utils/datasetCreator/dataStats/getCUICooccurrences.pl view on Meta::CPAN
($cui1, $cui2, $val) = split(/\t/,$line);
#update the cooccurrence count
$coCount{$cui1}+=$val;
#update the unique co-occurrence counts
$ucoCount{$cui1}++;
#NOTE: do not update counts for 2, because in the case where order
#does not matter, the matrix will have been pre-processed to ensure
#the second cui will appear first in the key. In the case where order
#does matter we just shouldnt be counting it anyway
}
close IN;
print "Outputting Results\n";
#output the co-occurrence counts, sorted by number of unique
# co-occurrences (descending)
foreach my $cui(sort {$ucoCount{$b}<=>$ucoCount{$a}} keys %ucoCount) {
#coCount and ucoCount will have the same keys (see above loop)
print OUT "$cui\t$coCount{$cui}\t$ucoCount{$cui}\n";
utils/datasetCreator/squaring/squareMatrix_partial.m view on Meta::CPAN
%nohup matlab -nojvm -r squareMatrix_partial -logfile squareMatrix_matlab.out </dev/null &
clear all;
close all;
%The number of rows/cols to square at a time. Larger numbers will run faster
% but will require more ram. Lower numbers will run slower but require less ram.
% You want this number to be as high as possible without MATLAB crashing
increment = 40000;
sparseSquare_sectioned('/home/henryst/lbdData/squaring/1975_1999_window8_noOrder','/home/henryst/lbdData/squaring/1975_1999_window8_noOrder_squared_secondTry',increment);
error('DONE!');
function sparseSquare_sectioned(fileIn, fileOut, increment)
disp(fileIn);
%open, close, and clear the output file
fid = fopen(fileOut,'w');
fclose(fid);
%load the data
data = load(fileIn);
vals = max(data);