view release on metacpan or search on metacpan
er_scripts/get_relationship_IsProjectedOnto view on Meta::CPAN
strings. The following fields are available:
=over 4
=item from_link
=item to_link
=item gene_context
=item percent_identity
=item score
=back
=item -to field-list
Choose a set of fields from the ProteinSequence entity to return. Field-list is a comma-separated list of
strings. The following fields are available:
er_scripts/get_relationship_IsProjectedOnto view on Meta::CPAN
be extended are written to stderr.
=cut
use Bio::KBase::Utilities::ScriptThing;
use Bio::KBase::CDMI::CDMIClient;
use Getopt::Long;
#Default fields
my @all_from_fields = ( 'id', 'sequence' );
my @all_rel_fields = ( 'from_link', 'to_link', 'gene_context', 'percent_identity', 'score' );
my @all_to_fields = ( 'id', 'sequence' );
my %all_from_fields = map { $_ => 1 } @all_from_fields;
my %all_rel_fields = map { $_ => 1 } @all_rel_fields;
my %all_to_fields = map { $_ => 1 } @all_to_fields;
my @default_fields = ('from-link', 'to-link');
my @from_fields;
my @rel_fields;
er_scripts/get_relationship_ProjectsOnto view on Meta::CPAN
strings. The following fields are available:
=over 4
=item from_link
=item to_link
=item gene_context
=item percent_identity
=item score
=back
=item -to field-list
Choose a set of fields from the ProteinSequence entity to return. Field-list is a comma-separated list of
strings. The following fields are available:
er_scripts/get_relationship_ProjectsOnto view on Meta::CPAN
be extended are written to stderr.
=cut
use Bio::KBase::Utilities::ScriptThing;
use Bio::KBase::CDMI::CDMIClient;
use Getopt::Long;
#Default fields
my @all_from_fields = ( 'id', 'sequence' );
my @all_rel_fields = ( 'from_link', 'to_link', 'gene_context', 'percent_identity', 'score' );
my @all_to_fields = ( 'id', 'sequence' );
my %all_from_fields = map { $_ => 1 } @all_from_fields;
my %all_rel_fields = map { $_ => 1 } @all_rel_fields;
my %all_to_fields = map { $_ => 1 } @all_to_fields;
my @default_fields = ('from-link', 'to-link');
my @from_fields;
my @rel_fields;
lib/Bio/KBase/CDMI/CDMI_APIImpl.pm view on Meta::CPAN
Kbase CS. The prokaryotic geneomes in the CS are grouped into OTUs by ribosomal
RNA (genomes within a single OTU have SSU rRNA that is greater than 97% identical).
If two genes occur close to one another (i.e., corresponding genes occur close
to one another), then we assign a score, which is the number of distinct OTUs
in which such clustering is detected. This allows one to normalize for situations
in which hundreds of corresponding genes are detected, but they all come from
very closely related genomes.
The significance of the score relates to the number of genomes in the database.
We recommend that you take the time to look at a set of scored pairs and determine
approximately what percentage appear to be actually related for a few cutoff values.
=back
=cut
sub fids_to_co_occurring_fids
{
my $self = shift;
my($fids) = @_;
lib/Bio/KBase/CDMI/CDMI_EntityAPIImpl.pm view on Meta::CPAN
to_link => 1, from_link => 1,
},
'HadResultsProducedBy' => {
to_link => 1, from_link => 1,
},
'ProjectsOnto' => {
to_link => 1, from_link => 1,
'gene_context' => 1,
'percent_identity' => 1,
'score' => 1,
},
'IsProjectedOnto' => {
to_link => 1, from_link => 1,
'gene_context' => 1,
'percent_identity' => 1,
'score' => 1,
},
'Provided' => {
to_link => 1, from_link => 1,
},
'WasProvidedBy' => {
to_link => 1, from_link => 1,
lib/Bio/KBase/CDMI/CDMI_EntityAPIImpl.pm view on Meta::CPAN
$return is a reference to a list where each element is a reference to a list containing 3 items:
0: a fields_ProteinSequence
1: a fields_ProjectsOnto
2: a fields_ProteinSequence
fields_ProteinSequence is a reference to a hash where the following keys are defined:
id has a value which is a string
sequence has a value which is a string
fields_ProjectsOnto is a reference to a hash where the following keys are defined:
id has a value which is a string
gene_context has a value which is an int
percent_identity has a value which is a float
score has a value which is a float
</pre>
=end html
=begin text
$ids is a reference to a list where each element is a string
$from_fields is a reference to a list where each element is a string
lib/Bio/KBase/CDMI/CDMI_EntityAPIImpl.pm view on Meta::CPAN
$return is a reference to a list where each element is a reference to a list containing 3 items:
0: a fields_ProteinSequence
1: a fields_ProjectsOnto
2: a fields_ProteinSequence
fields_ProteinSequence is a reference to a hash where the following keys are defined:
id has a value which is a string
sequence has a value which is a string
fields_ProjectsOnto is a reference to a hash where the following keys are defined:
id has a value which is a string
gene_context has a value which is an int
percent_identity has a value which is a float
score has a value which is a float
=end text
=item Description
This relationship connects two protein sequences for which a clear
lib/Bio/KBase/CDMI/CDMI_EntityAPIImpl.pm view on Meta::CPAN
=over 4
=item gene_context
number of homologous genes in the immediate context of the
two proteins, up to a maximum of 10
=item percent_identity
percent match between the two protein sequences
=item score
score describing the strength of the projection, from 0 to 1,
where 1 is the best
=back
lib/Bio/KBase/CDMI/CDMI_EntityAPIImpl.pm view on Meta::CPAN
$return is a reference to a list where each element is a reference to a list containing 3 items:
0: a fields_ProteinSequence
1: a fields_ProjectsOnto
2: a fields_ProteinSequence
fields_ProteinSequence is a reference to a hash where the following keys are defined:
id has a value which is a string
sequence has a value which is a string
fields_ProjectsOnto is a reference to a hash where the following keys are defined:
id has a value which is a string
gene_context has a value which is an int
percent_identity has a value which is a float
score has a value which is a float
</pre>
=end html
=begin text
$ids is a reference to a list where each element is a string
$from_fields is a reference to a list where each element is a string
lib/Bio/KBase/CDMI/CDMI_EntityAPIImpl.pm view on Meta::CPAN
$return is a reference to a list where each element is a reference to a list containing 3 items:
0: a fields_ProteinSequence
1: a fields_ProjectsOnto
2: a fields_ProteinSequence
fields_ProteinSequence is a reference to a hash where the following keys are defined:
id has a value which is a string
sequence has a value which is a string
fields_ProjectsOnto is a reference to a hash where the following keys are defined:
id has a value which is a string
gene_context has a value which is an int
percent_identity has a value which is a float
score has a value which is a float
=end text
=item Description
lib/Bio/KBase/CDMI/CDMI_EntityAPIImpl.pm view on Meta::CPAN
=item Definition
=begin html
<pre>
a reference to a hash where the following keys are defined:
id has a value which is a string
gene_context has a value which is an int
percent_identity has a value which is a float
score has a value which is a float
</pre>
=end html
=begin text
a reference to a hash where the following keys are defined:
id has a value which is a string
gene_context has a value which is an int
percent_identity has a value which is a float
score has a value which is a float
=end text
=back
=head2 fields_Provided
lib/Bio/KBase/CDMI/Client.pm view on Meta::CPAN
Kbase CS. The prokaryotic geneomes in the CS are grouped into OTUs by ribosomal
RNA (genomes within a single OTU have SSU rRNA that is greater than 97% identical).
If two genes occur close to one another (i.e., corresponding genes occur close
to one another), then we assign a score, which is the number of distinct OTUs
in which such clustering is detected. This allows one to normalize for situations
in which hundreds of corresponding genes are detected, but they all come from
very closely related genomes.
The significance of the score relates to the number of genomes in the database.
We recommend that you take the time to look at a set of scored pairs and determine
approximately what percentage appear to be actually related for a few cutoff values.
=cut
sub fids_to_co_occurring_fids
{
my($self, @args) = @_;
if ((my $n = @args) != 1)
{
Bio::KBase::Exceptions::ArgumentValidationError->throw(error =>
lib/Bio/KBase/CDMI/Client.pm view on Meta::CPAN
=over 4
=item gene_context
number of homologous genes in the immediate context of the
two proteins, up to a maximum of 10
=item percent_identity
percent match between the two protein sequences
=item score
score describing the strength of the projection, from 0 to 1,
where 1 is the best
=back
lib/Bio/KBase/CDMI/EntityImpl.pm view on Meta::CPAN
to_link => 1, from_link => 1,
},
'HadResultsProducedBy' => {
to_link => 1, from_link => 1,
},
'ProjectsOnto' => {
to_link => 1, from_link => 1,
'gene_context' => 1,
'percent_identity' => 1,
'score' => 1,
},
'IsProjectedOnto' => {
to_link => 1, from_link => 1,
'gene_context' => 1,
'percent_identity' => 1,
'score' => 1,
},
'Provided' => {
to_link => 1, from_link => 1,
},
'WasProvidedBy' => {
to_link => 1, from_link => 1,
scripts/corresponds view on Meta::CPAN
=back
=head2 Output Format
The standard output is a tab-delimited file. It consists of the input
file with extra columns added. If the abbreviated format is requested,
two columns get added (sc and the fid projected to). If the abbreviated
format is not requested, ten columns will be added
[percent-identity,matching-context,b1,e1,ln1,b2,e2,ln2,sc,to-fid]
Input lines that cannot be extended are written to stderr.
=cut
use SeedUtils;
my $usage = "usage: corresponds [-a cutoff] g1 g2 > output";
use Bio::KBase::CDMI::CDMIClient;
scripts/fids_to_co_occurring_fids view on Meta::CPAN
Kbase CS. The prokaryotic geneomes in the CS are grouped into OTUs by ribosomal
RNA (genomes within a single OTU have SSU rRNA that is greater than 97% identical).
If two genes occur close to one another (i.e., corresponding genes occur close
to one another), then we assign a score, which is the number of distinct OTUs
in which such clustering is detected. This allows one to normalize for situations
in which hundreds of corresponding genes are detected, but they all come from
very closely related genomes.
The significance of the score relates to the number of genomes in the database.
We recommend that you take the time to look at a set of scored pairs and determine
approximately what percentage appear to be actually related for a few cutoff values.
Example:
fids_to_co_occurring_fids [arguments] < input > output
The standard input should be a tab-separated table (i.e., each line
is a tab-separated set of fields). Normally, the last field in each
line would contain the identifer. If another column contains the identifier
use