Bio-Graphics-Glyph-decorated_gene

 view release on metacpan or  search on metacpan

lib/Bio/Graphics/Glyph/decorated_transcript.pm  view on Meta::CPAN

package Bio::Graphics::Glyph::decorated_transcript;

use strict;
use warnings;

use Bio::Graphics::Panel;
use List::Util qw[min max];
use vars qw($VERSION);

$VERSION = '0.02';

use constant DECORATION_TAG_NAME => 'protein_decorations';
use constant DEBUG              => 0;

my @color_names = Bio::Graphics::Panel::color_names;

use base
  qw(Bio::Graphics::Glyph::processed_transcript);

sub my_descripton {
  return <<END;
This glyph extends the functionality of the Bio::Graphics::Glyph::processed_transcript glyph 
and allows protein decorations (e.g., signal peptides, transmembrane domains, protein domains)
to be drawn on top of gene models. Currently, the glyph can draw decorations in form of colored or outlined boxes 
inside or around CDS segments. Protein decorations are specified at the 'mRNA' transcript level 
in protein coordinates. Protein coordinates are automatically mapped to nucleotide coordinates by the glyph. 
Decorations are allowed to span exon-exon junctions, in which case decorations are split between exons. 
By default, the glyph automatically assigns different colors to different types of protein decorations, whereas 
decorations of the same type are always assigned the same color. 

Protein decorations are provided either with mRNA features inside GFF files (see example below) or 
dynamically via callback function using the B<additional_decorations> option (see glyph options).
The following line is an example of an mRNA feature in a GFF file that contains two protein decorations, 
one signal peptide predicted by SignalP and one transmembrane domain predicted by TMHMM:

chr1   my_source   mRNA  74796  75599   .  +  .  ID=rna_gene-1;protein_decorations=SignalP40:SP:1:23:0:my_comment,TMHMM:TM:187:209:0

Each protein decoration consists of six fields separated by a colon:

1) Type. For example used to specify decoration source (e.g. 'SignalP40')
2) Name. Decoration name. Used as decoration label by default (e.g. 'SP' for signal peptide)
3) Start. Start coordinate at the protein-level (1-based coordinate)
4) End. End coordinate at the protein-level
5) Score. Optional. Score associated with a decoration (e.g. Pfam E-value). This score can be used 
   to dynamically filter or color decorations via callbacks (see glyph options).
6) Description. Optional. User-defined description of decoration. The glyph ignores this description, 
   but it will be made available to callback functions for inspection. Special characters 
   like ':' or ',' that might interfere with the GFF tag parser should be avoided. 

If callback functions are used as glyph parameters (see below), the callback is called for each
decoration separately. That is, the callback can get called multiple times for the same CDS feature,
but each time with a different active decoration. The currently drawn (active) decoration is made available 
to the callback via the glyph method 'active_decoration'. The active decoration is returned in form
of a Bio::Graphics::Feature object, with decoration data fields mapped to corresponding feature
attributes in the following way:

  type --> \$glyph->active_decoration->type
  name --> \$glyph->active_decoration->name
  nucleotide start coordinate --> \$glyph->active_decoration->start
  nucleotide end coordinate --> \$glyph->active_decoration->end
  protein start coordinate --> \$glyph->active_decoration->get_tag_values('p_start')
  protein end coordinate --> \$glyph->active_decoration->get_tag_values('p_end')
  score --> \$glyph->active_decoration->score
  description --> \$glyph->active_decoration->desc

In addition, the glyph passed to the callback allows access to the parent glyph and
parent feature if required (use \$glyph->parent or \$glyph->parent->feature). 

NOTE: This glyph works only with GFF3 compliant features. In particular, make sure that
every feature has a valid unique ID and that all child features have a valid parent id.

END
}

sub my_options {
    return {
	decoration_visible => [
	    'boolean',
	    'false',
	    'Specifies whether decorations should be visible or not. For selective display of individual', 
        'decorations, specify a callback function and return 1 or 0 after inspecting the active',
        'decoration of the glyph. '],
	decoration_color => [
	    'color',
	    undef,
	    'Decoration background color. If no color is specified, colors are assigned automatically',
	    'by decoration type and name, whereas decorations of identical type and name are assigned',
	    'the same color. A special color \'transparent\' can be used here in combination with',
	    'the option \'decoration_border\' to draw decorations as outlines.'],
	decoration_border => [
	    ['none', 'solid', 'dashed'],
	    'none',
	    'Decoration border style. By default, decorations are drawn without border (\'none\' or',
	    '0). Other valid options here include \'solid\' or \'dashed\'.'],
	decoration_border_color => [
	    'color',
	    'black',
	    'Color of decoration border.'],
	decoration_label => [
	    'string',
	    undef,
	    'Decoration label. If not specified, the second data field of the decoration is used',
	    'as label. Set this option to 0 to get unlabeled decorations. If the label text',
	    'extends beyond the size of the decorated segment, the label will be clipped. Clipping',
	    'does not occur for SVG output.'],

lib/Bio/Graphics/Glyph/decorated_transcript.pm  view on Meta::CPAN

			print STDERR "\n";
		}
	}

	$self->{'sorted_decorations'} = $sorted_decorations;
	
	return $sorted_decorations;
}

# returns decorations of feature as Bio::Graphics::Feature array, with coordinates mapped to nucleotide space
sub get_decorations_as_features
{
	my $feature = shift;
	my $additional_decorations = shift;  # optional
	my $cds_tag_name = shift;  # optional; default: "CDS"
	
	my @features;
	my $map = _get_coordinate_map($feature, $cds_tag_name);
	
	my @decorations = get_feature_decorations($feature);
	push(@decorations, @$additional_decorations) if ($additional_decorations);
	
	# map coordinates and encapsulate in Bio::Graphics::Feature objects
	foreach my $decoration (@decorations)
	{
		my ( $type, $name, $p_start, $p_end, $score, $desc ) = split( ":", $decoration );

		if (!defined $p_end)
		{
			warn "get_decorations_as_features(): WARNING: invalid decoration data for feature $feature(".$feature->primary_tag."):\n$decoration\n";
			next;
		}

		my $nt_start =$map->{$p_start}->{'codon_start'};
		if (!$nt_start)
		{
			warn "get_decorations_as_features(): WARNING: could not map decoration start coordinate on feature $feature(".$feature->primary_tag."):\n$decoration\n";
			next;
		}
		my $nt_end = $map->{$p_end}->{'codon_end'};
		if (!$nt_end)
		{
			warn "get_decorations_as_features(): WARNING: could not map decoration end coordinate on feature $feature(".$feature->primary_tag."):\n$decoration\n";
			next;
		}

		( $nt_start, $nt_end ) = ( $nt_end, $nt_start )
		  if ( $nt_start > $nt_end );

		my $f = Bio::Graphics::Feature->new
		(
			-type => $type,
			-name => $name,
			-display_name => $name,
			-start => $nt_start,
 			-end => $nt_end,
			-score => $score,
			-desc => $desc,
			-seq_id => $feature->seq_id,
			-strand => $feature->strand,
			-attributes => {   # remember protein coordinates for callbacks  
				'p_start' => $p_start, 
				'p_end' => $p_end 
			}
		);
		
		warn "DECORATION=$decoration --> $nt_start:$nt_end\n" if (DEBUG);

		push(@features, $f);
	}	
	
	return wantarray ? @features : \@features; 
}

# map protein to nucleotide coordinate
sub _get_coordinate_map {
	my $feature = shift;
	my $cds_tag_name = shift || 'CDS';
	my %map;
	
 # sort all CDS features by coordinates
 # NOTE: filtering for CDS features by passing feature type to get_SeqFeatures()
 # does not work for some reason, probably when no feature store attached
	my @cds = grep { $_->primary_tag eq $cds_tag_name } $feature->get_SeqFeatures();
	if ( $feature->strand > 0 ) {
		my ( $ppos, $residue ) = ( 1, 0 );
		my @sorted_cds = sort { $a->start <=> $b->start } (@cds);
		foreach my $c (@sorted_cds) {
			$map{ $ppos - 1 }{'codon_end'} = $c->start + $residue - 1
			  if ($residue);
			for (
				my $ntpos = $c->start + $residue ;
				$ntpos <= $c->end ;
				$ntpos += 3
			  )
			{
				$map{$ppos}{'codon_start'} = $ntpos;
				$map{$ppos}{'codon_end'}   = $ntpos + 2;
				$ppos++;
				$residue = $ntpos + 2 - $c->end;
			}
		}
	}
	else {
		my ( $ppos, $residue ) = ( 1, 0 );
		my @sorted_cds = reverse sort { $a->start <=> $b->start } (@cds);
		foreach my $c (@sorted_cds) {
			$map{ $ppos - 1 }{'codon_end'} = $c->end - $residue + 1
			  if ($residue);
			for (
				my $ntpos = $c->end - $residue ;
				$ntpos >= $c->start ;
				$ntpos -= 3
			  )
			{
				$map{$ppos}{'codon_start'} = $ntpos;
				$map{$ppos}{'codon_end'}   = $ntpos - 2;
#				print $self->feature->name."\t$ppos\t".$self->{'p2n'}{$ppos}{'codon_start'}."\t".$self->{'p2n'}{$ppos}{'codon_end'}."\n" if ($self->feature->name eq "DAF19-b");
				$ppos++;
				$residue = $c->start - ( $ntpos - 2 );
			}

lib/Bio/Graphics/Glyph/decorated_transcript.pm  view on Meta::CPAN

	$gd->string( $font, $h_left + 2, $gd->isa("GD::SVG::Image") ? $label_top-1 : $label_top, $label, $self->factory->translate_color($label_color) );

	$gd->clip( 0, 0, $gd->width, $gd->height )
	  if ( !$gd->isa("GD::SVG::Image") );
}

1;

__END__

=head1 NAME

Bio::Graphics::Glyph::decorated_transcript - draws processed transcript with protein decorations

=head1 SYNOPSIS

  See L<Bio::Graphics::Panel> and L<Bio::Graphics::Glyph>.

=head1 DESCRIPTION

This glyph extends the functionality of the L<Bio::Graphics::Glyph::processed_transcript> glyph 
and allows protein decorations (e.g., signal peptides, transmembrane domains, protein domains)
to be drawn on top of gene models. Currently, the glyph can draw decorations in form of colored or outlined boxes 
inside or around CDS segments. Protein decorations are specified at the 'mRNA' transcript level 
in protein coordinates. Protein coordinates are automatically mapped to nucleotide coordinates by the glyph. 
Decorations are allowed to span exon-exon junctions, in which case decorations are split between exons. 
By default, the glyph automatically assigns different colors to different types of protein decorations, whereas 
decorations of the same type are always assigned the same color. 

Protein decorations are provided either with mRNA features inside GFF files (see example below) or 
dynamically via callback function using the B<additional_decorations> option (see glyph options).
The following line is an example of an mRNA feature in a GFF file that contains two protein decorations, 
one signal peptide predicted by SignalP and one transmembrane domain predicted by TMHMM:

C<chr1   my_source   mRNA  74796  75599   .  +  .  ID=rna_gene-1;protein_decorations=SignalP40:SP:1:23:0:my_comment,TMHMM:TM:187:209:0>

Each protein decoration consists of six fields separated by a colon:


=over

=item 1. type

Decoration type.  For example used to specify decoration source (e.g. 'SignalP40')

=item 2. name

Decoration name. Used as decoration label by default (e.g. 'SP' for signal peptide)

=item 3. start

Start coordinate at the protein-level (1-based coordinate)

=item 4. end

End coordinate at the protein-level

=item 5. score

Optional. Score associated with a decoration (e.g. Pfam E-value). This score can be used 
to dynamically filter or color decorations via callbacks (see glyph options).

=item 6. description

Optional. User-defined description of decoration. The glyph ignores this description, 
but it will be made available to callback functions for inspection. Special characters 
like ':' or ',' that might interfere with the GFF tag parser should be avoided. 

=back 

If callback functions are used as glyph parameters (see below), the callback is called for each
decoration separately. That is, the callback can get called multiple times for the same CDS feature,
but each time with a different active decoration. The currently drawn (active) decoration is made available 
to the callback via the glyph method 'active_decoration'. The active decoration is returned in form
of a Bio::Graphics::Feature object, with decoration data fields mapped to corresponding feature
attributes in the following way:

=over

=item * type --> $glyph->active_decoration->type

=item * name --> $glyph->active_decoration->name

=item * nucleotide start coordinate --> $glyph->active_decoration->start

=item * nucleotide end coordinate --> $glyph->active_decoration->end

=item * protein start coordinate --> $glyph->active_decoration->get_tag_values('p_start')

=item * protein end coordinate --> $glyph->active_decoration->get_tag_values('p_end')

=item * score --> $glyph->active_decoration->score

=item * description --> $glyph->active_decoration->desc

=back 

In addition, the glyph passed to the callback allows access to the parent glyph and
parent feature if required (use $glyph->parent or $glyph->parent->feature). 

NOTE: This glyph works only with GFF3 compliant features. In particular, make sure that
every feature has a valid unique ID and that all child features have a valid parent id.

=head2 OPTIONS

This glyph inherits all options from the L<Bio::Graphics::Glyph::processed_transcript> glyph. 
In addition, it recognizes the following glyph-specific options:

  Option          Description                                              Default
  ------          -----------                                              -------

  -decoration_visible                                                      false
  
                  Specifies whether decorations should be visible
                  or not. For selective display of individual 
                  decorations, specify a callback function and 
                  return 1 or 0 after inspecting the active decoration
                  of the glyph. 

  -decoration_color                                                        <auto>
  



( run in 0.402 second using v1.01-cache-2.11-cpan-140bd7fdf52 )