App-Sandy

 view release on metacpan or  search on metacpan

lib/App/Sandy/Command/Genome.pm  view on Meta::CPAN


 sandy genome [options] <fasta-file>

 Arguments:
  a fasta file

 Input/Output options:
  -h, --help                         brief help message
  -H, --man                          full documentation
  -v, --verbose                      print log messages
  -p, --prefix                       prefix output [default:"out"]
  -o, --output-dir                   output directory [default:"."]
  -O, --output-format                bam, sam, fastq.gz, fastq [default:"fastq.gz"]
  -1, --join-paired-ends             merge R1 and R2 outputs in one file
  -x, --compression-level            speed compression: "1" - compress faster,
                                     "9" - compress better [default:"6"; Integer]

 Runtime options:
  -j, --jobs                         number of jobs [default:"1"; Integer]
  -s, --seed                         set the seed of the base generator
                                     [default:"time()"; Integer]

 Sequence identifier options:
  -i, --append-id                    append to the defined template id [Format]
  -I, --id                           overlap the default template id [Format]

 Sequencing option:
  -q, --quality-profile              sequencing system profiles from quality
                                     database [default:"poisson"]
  -e, --sequencing-error             sequencing error rate for poisson
                                     [default:"0.001"; Number]
  -m, --read-mean                    read mean size for poisson
                                     [default:"100"; Integer]
  -d, --read-stdd                    read standard deviation size for poisson
                                     [default:"0"; Integer]
  -t, --sequencing-type              single-end or paired-end reads
                                     [default:"paired-end"]
  -M, --fragment-mean                the fragment mean size for paired-end reads
                                     [default:"300"; Integer]
  -D, --fragment-stdd                the fragment standard deviation size for
                                     paired-end reads [default:"50"; Integer]

 Genome-specific options:
  -c, --coverage                     genome coverage [default:"8", Number]
  -a, --genomic-variation            a list of genomic variation entries from
                                     variation database. This option may be passed
                                     multiple times [default:"none"]
  -A, --genomic-variation-regex      a list of perl-like regex to match genomic
                                     variation entries in variation database.
                                     This option may be passed multiple times
                                     [default:"none"]

=head1 DESCRIPTION

This subcommand simulates genome sequencing reads taking into account the
quality-profile and the genome-variation patterns, along with: raffle
seed; coverage (depth); fragment mean and standard deviation; single-end
(long and short fragments) and paired-end sequencing type; bam, sam,
fastq.gz and fastq output formats and more.

=head2 INPUT

I<sandy genome> expects as argument a fasta file with chromosome sequences.
For example, L<the GENCODE human genome|https://www.gencodegenes.org/human/>
GRCh38.p13 fasta file.

=head2 OUTPUT

The output file generated will depend on the I<output-format> (fastq, bam),
on the I<join-paired-ends> option (mate read pairs into a single file) and
on the I<sequencing-type> (single-end, paired-end). A file with the simulated
coverage (${prefix}_coverage.tsv) for each chromosome (read counts) also
accompanies the output file.

=head1 OPTIONS

=over 8

=item B<--help>

Print a brief help message and exits.

=item B<--man>

Prints the manual page and exits.

=item B<--verbose>

Prints log information to standard error

=item B<--prefix>

Concatenates the prefix to the output-file name.

=item B<--output-dir>

Creates output-file inside output-dir. If output-dir
does not exist, it is created recursively

=item B<--output-format>

Choose the output format. Available options are:
I<bam>, I<sam>, I<fastq.gz>, I<fastq>.
For I<bam> option, B<--append-id> is ignored, considering
that the sequence identifier is splitted by blank character, so
just the first field is included into the query name column
(first column).

=item B<--join-paired-ends>

By default, paired-end reads are put into two different files,
I<prefix_R[12]_001.fastq(\.gz)?>. If the user wants both outputs
together, she can pass this option.
If the B<--id> does not have the escape character %R, it is
automatically included right after the first field (blank separated values)
as in I<id/%R> - which resolves to I<id/1> or I<id/2>.
It is necessary to distinguish which read is R1/R2

=item B<--compression-level>

Regulates the speed of compression using the specified digit (between 1 and 9),
where "1" indicates the fastest compression method (less compression) and "9"
indicates the slowest compression method (best compression). The default
compression level is "6"

=item B<--append-id>



( run in 0.715 second using v1.01-cache-2.11-cpan-39bf76dae61 )