percent results from the CPAN

percent

Bio-KBase

view release on metacpan or search on metacpan

er_scripts/get_relationship_IsProjectedOnto view on Meta::CPAN

strings. The following fields are available:

=over 4

=item from_link

=item to_link

=item gene_context

=item percent_identity

=item score

=back    

=item -to field-list

Choose a set of fields from the ProteinSequence entity to return. Field-list is a comma-separated list of 
strings. The following fields are available:

er_scripts/get_relationship_IsProjectedOnto view on Meta::CPAN

be extended are written to stderr.  

=cut
use Bio::KBase::Utilities::ScriptThing;
use Bio::KBase::CDMI::CDMIClient;
use Getopt::Long;

#Default fields
 
my @all_from_fields = ( 'id', 'sequence' );
my @all_rel_fields = ( 'from_link', 'to_link', 'gene_context', 'percent_identity', 'score' );
my @all_to_fields = ( 'id', 'sequence' );

my %all_from_fields = map { $_ => 1 } @all_from_fields;
my %all_rel_fields = map { $_ => 1 } @all_rel_fields;
my %all_to_fields = map { $_ => 1 } @all_to_fields;

my @default_fields = ('from-link', 'to-link');

my @from_fields;
my @rel_fields;

er_scripts/get_relationship_ProjectsOnto view on Meta::CPAN

strings. The following fields are available:

=over 4

=item from_link

=item to_link

=item gene_context

=item percent_identity

=item score

=back    

=item -to field-list

Choose a set of fields from the ProteinSequence entity to return. Field-list is a comma-separated list of 
strings. The following fields are available:

er_scripts/get_relationship_ProjectsOnto view on Meta::CPAN

be extended are written to stderr.  

=cut
use Bio::KBase::Utilities::ScriptThing;
use Bio::KBase::CDMI::CDMIClient;
use Getopt::Long;

#Default fields
 
my @all_from_fields = ( 'id', 'sequence' );
my @all_rel_fields = ( 'from_link', 'to_link', 'gene_context', 'percent_identity', 'score' );
my @all_to_fields = ( 'id', 'sequence' );

my %all_from_fields = map { $_ => 1 } @all_from_fields;
my %all_rel_fields = map { $_ => 1 } @all_rel_fields;
my %all_to_fields = map { $_ => 1 } @all_to_fields;

my @default_fields = ('from-link', 'to-link');

my @from_fields;
my @rel_fields;

lib/Bio/KBase/CDMI/CDMI_APIImpl.pm view on Meta::CPAN

Kbase CS.  The prokaryotic geneomes in the CS are grouped into OTUs by ribosomal
RNA (genomes within a single OTU have SSU rRNA that is greater than 97% identical).
If two genes occur close to one another (i.e., corresponding genes occur close
to one another), then we assign a score, which is the number of distinct OTUs
in which such clustering is detected.  This allows one to normalize for situations
in which hundreds of corresponding genes are detected, but they all come from
very closely related genomes.

The significance of the score relates to the number of genomes in the database.
We recommend that you take the time to look at a set of scored pairs and determine
approximately what percentage appear to be actually related for a few cutoff values.

=back

=cut

sub fids_to_co_occurring_fids
{
    my $self = shift;
    my($fids) = @_;

lib/Bio/KBase/CDMI/CDMI_EntityAPIImpl.pm view on Meta::CPAN

	to_link => 1, from_link => 1,
	
    },
    'HadResultsProducedBy' => {
	to_link => 1, from_link => 1,
	
    },
    'ProjectsOnto' => {
	to_link => 1, from_link => 1,
		    'gene_context' => 1,
		    'percent_identity' => 1,
		    'score' => 1,
	
    },
    'IsProjectedOnto' => {
	to_link => 1, from_link => 1,
		    'gene_context' => 1,
		    'percent_identity' => 1,
		    'score' => 1,
	
    },
    'Provided' => {
	to_link => 1, from_link => 1,
	
    },
    'WasProvidedBy' => {
	to_link => 1, from_link => 1,

lib/Bio/KBase/CDMI/CDMI_EntityAPIImpl.pm view on Meta::CPAN

$return is a reference to a list where each element is a reference to a list containing 3 items:
	0: a fields_ProteinSequence
	1: a fields_ProjectsOnto
	2: a fields_ProteinSequence
fields_ProteinSequence is a reference to a hash where the following keys are defined:
	id has a value which is a string
	sequence has a value which is a string
fields_ProjectsOnto is a reference to a hash where the following keys are defined:
	id has a value which is a string
	gene_context has a value which is an int
	percent_identity has a value which is a float
	score has a value which is a float

</pre>

=end html

=begin text

$ids is a reference to a list where each element is a string
$from_fields is a reference to a list where each element is a string

lib/Bio/KBase/CDMI/CDMI_EntityAPIImpl.pm view on Meta::CPAN

$return is a reference to a list where each element is a reference to a list containing 3 items:
	0: a fields_ProteinSequence
	1: a fields_ProjectsOnto
	2: a fields_ProteinSequence
fields_ProteinSequence is a reference to a hash where the following keys are defined:
	id has a value which is a string
	sequence has a value which is a string
fields_ProjectsOnto is a reference to a hash where the following keys are defined:
	id has a value which is a string
	gene_context has a value which is an int
	percent_identity has a value which is a float
	score has a value which is a float


=end text



=item Description

This relationship connects two protein sequences for which a clear

lib/Bio/KBase/CDMI/CDMI_EntityAPIImpl.pm view on Meta::CPAN


=over 4


=item gene_context

number of homologous genes in the immediate context of the
two proteins, up to a maximum of 10


=item percent_identity

percent match between the two protein sequences


=item score

score describing the strength of the projection, from 0 to 1,
where 1 is the best



=back

lib/Bio/KBase/CDMI/CDMI_EntityAPIImpl.pm view on Meta::CPAN

$return is a reference to a list where each element is a reference to a list containing 3 items:
	0: a fields_ProteinSequence
	1: a fields_ProjectsOnto
	2: a fields_ProteinSequence
fields_ProteinSequence is a reference to a hash where the following keys are defined:
	id has a value which is a string
	sequence has a value which is a string
fields_ProjectsOnto is a reference to a hash where the following keys are defined:
	id has a value which is a string
	gene_context has a value which is an int
	percent_identity has a value which is a float
	score has a value which is a float

</pre>

=end html

=begin text

$ids is a reference to a list where each element is a string
$from_fields is a reference to a list where each element is a string

lib/Bio/KBase/CDMI/CDMI_EntityAPIImpl.pm view on Meta::CPAN

$return is a reference to a list where each element is a reference to a list containing 3 items:
	0: a fields_ProteinSequence
	1: a fields_ProjectsOnto
	2: a fields_ProteinSequence
fields_ProteinSequence is a reference to a hash where the following keys are defined:
	id has a value which is a string
	sequence has a value which is a string
fields_ProjectsOnto is a reference to a hash where the following keys are defined:
	id has a value which is a string
	gene_context has a value which is an int
	percent_identity has a value which is a float
	score has a value which is a float


=end text



=item Description

lib/Bio/KBase/CDMI/CDMI_EntityAPIImpl.pm view on Meta::CPAN



=item Definition

=begin html

<pre>
a reference to a hash where the following keys are defined:
id has a value which is a string
gene_context has a value which is an int
percent_identity has a value which is a float
score has a value which is a float

</pre>

=end html

=begin text

a reference to a hash where the following keys are defined:
id has a value which is a string
gene_context has a value which is an int
percent_identity has a value which is a float
score has a value which is a float


=end text

=back



=head2 fields_Provided

lib/Bio/KBase/CDMI/Client.pm view on Meta::CPAN

Kbase CS.  The prokaryotic geneomes in the CS are grouped into OTUs by ribosomal
RNA (genomes within a single OTU have SSU rRNA that is greater than 97% identical).
If two genes occur close to one another (i.e., corresponding genes occur close
to one another), then we assign a score, which is the number of distinct OTUs
in which such clustering is detected.  This allows one to normalize for situations
in which hundreds of corresponding genes are detected, but they all come from
very closely related genomes.

The significance of the score relates to the number of genomes in the database.
We recommend that you take the time to look at a set of scored pairs and determine
approximately what percentage appear to be actually related for a few cutoff values.

=cut

sub fids_to_co_occurring_fids
{
    my($self, @args) = @_;

    if ((my $n = @args) != 1)
    {
	Bio::KBase::Exceptions::ArgumentValidationError->throw(error =>

lib/Bio/KBase/CDMI/Client.pm view on Meta::CPAN


=over 4


=item gene_context

number of homologous genes in the immediate context of the
two proteins, up to a maximum of 10


=item percent_identity

percent match between the two protein sequences


=item score

score describing the strength of the projection, from 0 to 1,
where 1 is the best



=back

lib/Bio/KBase/CDMI/EntityImpl.pm view on Meta::CPAN

	to_link => 1, from_link => 1,
	
    },
    'HadResultsProducedBy' => {
	to_link => 1, from_link => 1,
	
    },
    'ProjectsOnto' => {
	to_link => 1, from_link => 1,
		    'gene_context' => 1,
		    'percent_identity' => 1,
		    'score' => 1,
	
    },
    'IsProjectedOnto' => {
	to_link => 1, from_link => 1,
		    'gene_context' => 1,
		    'percent_identity' => 1,
		    'score' => 1,
	
    },
    'Provided' => {
	to_link => 1, from_link => 1,
	
    },
    'WasProvidedBy' => {
	to_link => 1, from_link => 1,

scripts/corresponds view on Meta::CPAN


=back

=head2 Output Format

The standard output is a tab-delimited file. It consists of the input
file with extra columns added.  If the abbreviated format is requested,
two columns get added (sc and the fid projected to).  If the abbreviated
format is not requested, ten columns will be added 

[percent-identity,matching-context,b1,e1,ln1,b2,e2,ln2,sc,to-fid]

Input lines that cannot be extended are written to stderr.

=cut

use SeedUtils;

my $usage = "usage: corresponds [-a cutoff] g1 g2 > output";

use Bio::KBase::CDMI::CDMIClient;

scripts/fids_to_co_occurring_fids view on Meta::CPAN

Kbase CS.  The prokaryotic geneomes in the CS are grouped into OTUs by ribosomal
RNA (genomes within a single OTU have SSU rRNA that is greater than 97% identical).
If two genes occur close to one another (i.e., corresponding genes occur close
to one another), then we assign a score, which is the number of distinct OTUs
in which such clustering is detected.  This allows one to normalize for situations
in which hundreds of corresponding genes are detected, but they all come from
very closely related genomes.

The significance of the score relates to the number of genomes in the database.
We recommend that you take the time to look at a set of scored pairs and determine
approximately what percentage appear to be actually related for a few cutoff values.


Example:

    fids_to_co_occurring_fids [arguments] < input > output

The standard input should be a tab-separated table (i.e., each line
is a tab-separated set of fields).  Normally, the last field in each
line would contain the identifer. If another column contains the identifier
use

( run in 0.349 second using v1.01-cache-2.11-cpan-709fd43a63f )