FAST

 view release on metacpan or  search on metacpan

bin/fastr  view on Meta::CPAN

  if ($IN) { 
    while (my $seq = $IN->next_seq()) {
      my $data = $seq->$selector();
      $moltype ||= $seq->alphabet;
      if ($iupac_or_strict) { ## this needs to be here because moltype is set at runtime, most often automatically
	if ($strict) {
	  $search = $strict_symbols{$moltype};	  
	}
	else { # elsif ($iupac) 
	  $search = $iupac_symbols{$moltype};
	}
	$replace = $ambig_char || $ambig{$moltype};
      }
      elsif ($degap) {
	$delete = 1;
	$search = '-';
	$replace = "";
      }

      my $count;
      my @options = ();
      push @options, 'c' if ($complement);
      push @options, 'd' if ($delete);
      push @options, 's' if ($squash);
      my $options = join '',@options;
      eval sprintf "\$count = \$data =~ tr/%s/%s/%s", $search, $replace, $options; 
      die "$NAME: $@\n" if $@;
      $seq->$selector($data);
      # if ($append_info) {
      # 	$seq->desc(join ' ',$seq->desc(),"$NAME: $search|$replace|$options:$count");
      # }
      print $OUT $seq;
    }
    undef $IN;
  }
}



__END__

=head1 NAME

B<fastr> - character-based transformations on sequence records

=head1 SYNOPSIS

B<fastr>     [OPTION]... [SEARCHLIST] [REPLACELIST] [MULTIFASTA-FILE...]

B<fastr -n>  [OPTION]... [SEARCHLIST]               [MULTIFASTA-FILE...]

B<fastr --strict> [B<--ambig>=<char>]                  [MULTIFASTA-FILE...]

B<fastr --iupac>  [B<--ambig>=<char>]                  [MULTIFASTA-FILE...]

B<fastr --degap>                                       [MULTIFASTA-FILE...]

=head1 DESCRIPTION

B<fastr> takes multifasta format sequence or alignment data as input,
and faciliates character transliterations on identifiers (by default),
sequences or descriptions using the Perl B<tr///> character
transliteration operator. This faciliates character remapping, case
changes, character deletions, degapping, squashing of repeated
characters, and more. Special modes B<--strict>, B<--iupac>, and B<--degap> automatically direct transliterations on 
facilitate enforcement of sequence alphabets, and remapping illegal
characters to 'N' or 'X' or a user-defined character.

Options specific to B<fastr>:
  B<-s>, B<--sequence>         transliterate sequences
  B<-d>, B<--description>       transliterate descriptions  
  B<-D>, B<--delete>            delete found characters not replaced
  B<-S>, B<--squash>            squash duplicate replaced characters 
  B<-n>, B<--no-replace>        squash/delete characters in searchlist 
  B<-c>, B<--complement>        complement searchlist as a character set
  B<--strict>                map ambiguous/invalid nongap chars in sequences
  B<--iupac>                 map invalid nongap chars (to N/X by default)
  B<-N>, B<-X>, B<--ambig>=<char>  map ambig/invalid chars to <char>
  B<--degap>                 delete gap characters '-' 
  B<-j>, B<--join>=<string>     use <string> to join data annotation

Options general to FAST:
  B<-h>, B<--help>                     print a brief help message
  B<--man>             	        print full documentation
  B<--version>                      print version
  B<-l>, B<--log>                      create/append to logfile	
  B<-L>, B<--logname>=<string>         use logfile name <string>
  B<-C>, B<--comment>=<string>         save comment <string> to log
  B<--format>=<format>              use alternative format for input  
  B<--moltype>=<[dna|rna|protein]>  specify input sequence type
  B<-q>, B<--fastq>                    fastq format for input and output

=head1 SEARCHLIST AND REPLACELIST

These arguments are strings of characters, or character ranges with
hyphens (like "A-Z" "0-9"), separated by the empty string. Characters
are remapped in corresponding order. If SEARCHLIST is longer than
REPLACELIST, the last character of REPLACELIST is repeated unless in
"delete-mode," in which case unmatched characters are deleted. In
"squash-mode" transliteration happens before squashing. With the
"no-replace" option, only SEARCHLIST is provided, no transliteration
occurs, and SEARCHLIST characters are directly squashed and
deleted. For more information see documentation for B<tr///> in the
B<perlop> manpage.

=head1 INPUT AND OUTPUT

B<fastr> is part of FAST, the FAST Analysis of Sequences Toolbox, based
on Bioperl. Most core FAST utilities expect input and return output in
multifasta format. Input can occur in one or more files or on
STDIN. Output occurs to STDOUT. The FAST utility B<fasconvert> can
reformat other formats to and from multifasta.

=head1 OPTIONS

=over 8

=item B<-s>
      B<--sequence>

Transliterate sequences (identifiers by default).

=item B<-d>
      B<--description>



( run in 1.600 second using v1.01-cache-2.11-cpan-71847e10f99 )