App-Sandy
view release on metacpan or search on metacpan
lib/App/Sandy/Command/Genome.pm view on Meta::CPAN
sandy genome [options] <fasta-file>
Arguments:
a fasta file
Input/Output options:
-h, --help brief help message
-H, --man full documentation
-v, --verbose print log messages
-p, --prefix prefix output [default:"out"]
-o, --output-dir output directory [default:"."]
-O, --output-format bam, sam, fastq.gz, fastq [default:"fastq.gz"]
-1, --join-paired-ends merge R1 and R2 outputs in one file
-x, --compression-level speed compression: "1" - compress faster,
"9" - compress better [default:"6"; Integer]
Runtime options:
-j, --jobs number of jobs [default:"1"; Integer]
-s, --seed set the seed of the base generator
[default:"time()"; Integer]
Sequence identifier options:
-i, --append-id append to the defined template id [Format]
-I, --id overlap the default template id [Format]
Sequencing option:
-q, --quality-profile sequencing system profiles from quality
database [default:"poisson"]
-e, --sequencing-error sequencing error rate for poisson
[default:"0.001"; Number]
-m, --read-mean read mean size for poisson
[default:"100"; Integer]
-d, --read-stdd read standard deviation size for poisson
[default:"0"; Integer]
-t, --sequencing-type single-end or paired-end reads
[default:"paired-end"]
-M, --fragment-mean the fragment mean size for paired-end reads
[default:"300"; Integer]
-D, --fragment-stdd the fragment standard deviation size for
paired-end reads [default:"50"; Integer]
Genome-specific options:
-c, --coverage genome coverage [default:"8", Number]
-a, --genomic-variation a list of genomic variation entries from
variation database. This option may be passed
multiple times [default:"none"]
-A, --genomic-variation-regex a list of perl-like regex to match genomic
variation entries in variation database.
This option may be passed multiple times
[default:"none"]
=head1 DESCRIPTION
This subcommand simulates genome sequencing reads taking into account the
quality-profile and the genome-variation patterns, along with: raffle
seed; coverage (depth); fragment mean and standard deviation; single-end
(long and short fragments) and paired-end sequencing type; bam, sam,
fastq.gz and fastq output formats and more.
=head2 INPUT
I<sandy genome> expects as argument a fasta file with chromosome sequences.
For example, L<the GENCODE human genome|https://www.gencodegenes.org/human/>
GRCh38.p13 fasta file.
=head2 OUTPUT
The output file generated will depend on the I<output-format> (fastq, bam),
on the I<join-paired-ends> option (mate read pairs into a single file) and
on the I<sequencing-type> (single-end, paired-end). A file with the simulated
coverage (${prefix}_coverage.tsv) for each chromosome (read counts) also
accompanies the output file.
=head1 OPTIONS
=over 8
=item B<--help>
Print a brief help message and exits.
=item B<--man>
Prints the manual page and exits.
=item B<--verbose>
Prints log information to standard error
=item B<--prefix>
Concatenates the prefix to the output-file name.
=item B<--output-dir>
Creates output-file inside output-dir. If output-dir
does not exist, it is created recursively
=item B<--output-format>
Choose the output format. Available options are:
I<bam>, I<sam>, I<fastq.gz>, I<fastq>.
For I<bam> option, B<--append-id> is ignored, considering
that the sequence identifier is splitted by blank character, so
just the first field is included into the query name column
(first column).
=item B<--join-paired-ends>
By default, paired-end reads are put into two different files,
I<prefix_R[12]_001.fastq(\.gz)?>. If the user wants both outputs
together, she can pass this option.
If the B<--id> does not have the escape character %R, it is
automatically included right after the first field (blank separated values)
as in I<id/%R> - which resolves to I<id/1> or I<id/2>.
It is necessary to distinguish which read is R1/R2
=item B<--compression-level>
Regulates the speed of compression using the specified digit (between 1 and 9),
where "1" indicates the fastest compression method (less compression) and "9"
indicates the slowest compression method (best compression). The default
compression level is "6"
=item B<--append-id>
( run in 0.715 second using v1.01-cache-2.11-cpan-39bf76dae61 )