view release on metacpan or search on metacpan
doc/bacteria_2_3.md view on Meta::CPAN
- [lambda: 3GS](#lambda-3gs)
- [Shigella flexneri NCTC0001, ç¦æ°å¿è´ºæ°è](#shigella-flexneri-nctc0001-ç¦æ°å¿è´ºæ°è)
- [Sfle: download](#sfle-download)
- [Sfle: combinations of different quality values and read lengths](#sfle-combinations-of-different-quality-values-and-read-lengths)
- [Sfle: down sampling](#sfle-down-sampling)
- [Sfle: generate super-reads](#sfle-generate-super-reads)
- [Sfle: create anchors](#sfle-create-anchors)
- [Sfle: results](#sfle-results)
- [Sfle: merge anchors](#sfle-merge-anchors)
- [Sfle: 3GS](#sfle-3gs)
- [Sfle: expand anchors](#sfle-expand-anchors)
- [Vibrio parahaemolyticus ATCC BAA-239, 坿º¶è¡å¼§è](#vibrio-parahaemolyticus-atcc-baa-239-坿º¶è¡å¼§è)
- [Vpar: download](#vpar-download)
- [Vpar: combinations of different quality values and read lengths](#vpar-combinations-of-different-quality-values-and-read-lengths)
- [Vpar: down sampling](#vpar-down-sampling)
- [Vpar: generate super-reads](#vpar-generate-super-reads)
- [Vpar: create anchors](#vpar-create-anchors)
- [Vpar: results](#vpar-results)
- [Vpar: merge anchors](#vpar-merge-anchors)
- [Vpar: 3GS](#vpar-3gs)
- [Vpar: expand anchors](#vpar-expand-anchors)
- [Legionella pneumophila subsp. pneumophila ATCC 33152D-5; Philadelphia-1, åèºåå¢è](#legionella-pneumophila-subsp-pneumophila-atcc-33152d-5-philadelphia-1-åèºåå¢è)
- [Lpne: download](#lpne-download)
- [Lpne: combinations of different quality values and read lengths](#lpne-combinations-of-different-quality-values-and-read-lengths)
- [Lpne: down sampling](#lpne-down-sampling)
- [Lpne: generate super-reads](#lpne-generate-super-reads)
- [Lpne: create anchors](#lpne-create-anchors)
- [Lpne: results](#lpne-results)
- [Lpne: merge anchors](#lpne-merge-anchors)
- [Lpne: 3GS](#lpne-3gs)
- [Lpne: expand anchors](#lpne-expand-anchors)
- [Neisseria gonorrhoeae FDAARGOS_207, æ·ç
å¥çæ°è](#neisseria-gonorrhoeae-fdaargos-207-æ·ç
å¥çæ°è)
- [Ngon: download](#ngon-download)
- [Ngon: combinations of different quality values and read lengths](#ngon-combinations-of-different-quality-values-and-read-lengths)
- [Ngon: down sampling](#ngon-down-sampling)
- [Ngon: generate super-reads](#ngon-generate-super-reads)
- [Ngon: create anchors](#ngon-create-anchors)
- [Ngon: results](#ngon-results)
- [Ngon: merge anchors](#ngon-merge-anchors)
- [Ngon: 3GS](#ngon-3gs)
- [Ngon: expand anchors](#ngon-expand-anchors)
- [Neisseria meningitidis FDAARGOS_209, èèçå¥çæ°è](#neisseria-meningitidis-fdaargos-209-èèçå¥çæ°è)
- [Nmen: download](#nmen-download)
- [Nmen: combinations of different quality values and read lengths](#nmen-combinations-of-different-quality-values-and-read-lengths)
- [Nmen: quorum](#nmen-quorum)
- [Nmen: down sampling](#nmen-down-sampling)
- [Nmen: k-unitigs and anchors (sampled)](#nmen-k-unitigs-and-anchors-sampled)
- [Nmen: merge anchors](#nmen-merge-anchors)
- [Nmen: 3GS](#nmen-3gs)
- [Nmen: expand anchors](#nmen-expand-anchors)
- [Bordetella pertussis FDAARGOS_195, ç¾æ¥å³åå¾·ç¹æ°æè](#bordetella-pertussis-fdaargos-195-ç¾æ¥å³åå¾·ç¹æ°æè)
- [Bper: download](#bper-download)
- [Bper: combinations of different quality values and read lengths](#bper-combinations-of-different-quality-values-and-read-lengths)
- [Bper: down sampling](#bper-down-sampling)
- [Bper: generate super-reads](#bper-generate-super-reads)
- [Bper: create anchors](#bper-create-anchors)
- [Bper: results](#bper-results)
- [Bper: merge anchors](#bper-merge-anchors)
- [Corynebacterium diphtheriae FDAARGOS_197, ç½åæè](#corynebacterium-diphtheriae-fdaargos-197-ç½åæè)
- [Cdip: download](#cdip-download)
- [Cdip: combinations of different quality values and read lengths](#cdip-combinations-of-different-quality-values-and-read-lengths)
- [Cdip: quorum](#cdip-quorum)
- [Cdip: down sampling](#cdip-down-sampling)
- [Cdip: k-unitigs and anchors (sampled)](#cdip-k-unitigs-and-anchors-sampled)
- [Cdip: merge anchors](#cdip-merge-anchors)
- [Cdip: 3GS](#cdip-3gs)
- [Cdip: expand anchors](#cdip-expand-anchors)
- [Francisella tularensis FDAARGOS_247, åæçå¼æè¥¿æ¯æ°è](#francisella-tularensis-fdaargos-247-åæçå¼æè¥¿æ¯æ°è)
- [Ftul: download](#ftul-download)
- [Ftul: combinations of different quality values and read lengths](#ftul-combinations-of-different-quality-values-and-read-lengths)
- [Ftul: quorum](#ftul-quorum)
- [Ftul: down sampling](#ftul-down-sampling)
- [Ftul: k-unitigs and anchors (sampled)](#ftul-k-unitigs-and-anchors-sampled)
- [Ftul: merge anchors](#ftul-merge-anchors)
- [Ftul: 3GS](#ftul-3gs)
- [Ftul: expand anchors](#ftul-expand-anchors)
- [Haemophilus influenzae FDAARGOS_199, æµæåè¡æè](#haemophilus-influenzae-fdaargos-199-æµæåè¡æè)
- [Hinf: download](#hinf-download)
- [Listeria monocytogenes FDAARGOS_351, åæ ¸ç»èå¢çææ¯ç¹æ°è](#listeria-monocytogenes-fdaargos-351-åæ ¸ç»èå¢çææ¯ç¹æ°è)
- [Lmon: download](#lmon-download)
- [Clostridioides difficile 630](#clostridioides-difficile-630)
- [Cdif: download](#cdif-download)
- [Campylobacter jejuni subsp. jejuni ATCC 700819, 空è å¼¯æ²æè](#campylobacter-jejuni-subsp-jejuni-atcc-700819-空è å¼¯æ²æè)
- [Cjej: download](#cjej-download)
doc/bacteria_2_3.md view on Meta::CPAN
-p Sfle -d canu-raw-80x \
gnuplot=$(brew --prefix)/Cellar/$(brew list --versions gnuplot | sed 's/ /\//')/bin/gnuplot \
genomeSize=4.8m \
-pacbio-raw 3_pacbio/pacbio.80x.fasta
faops n50 -S -C canu-raw-40x/Sfle.trimmedReads.fasta.gz
faops n50 -S -C canu-raw-80x/Sfle.trimmedReads.fasta.gz
```
## Sfle: expand anchors
* anchorLong
```bash
BASE_DIR=$HOME/data/anchr/Sfle
cd ${BASE_DIR}
anchr cover \
--parallel 16 \
-c 2 -m 40 \
doc/bacteria_2_3.md view on Meta::CPAN
-p Vpar -d canu-raw-80x \
gnuplot=$(brew --prefix)/Cellar/$(brew list --versions gnuplot | sed 's/ /\//')/bin/gnuplot \
genomeSize=5.2m \
-pacbio-raw 3_pacbio/pacbio.80x.fasta
faops n50 -S -C canu-raw-40x/Vpar.trimmedReads.fasta.gz
faops n50 -S -C canu-raw-80x/Vpar.trimmedReads.fasta.gz
```
## Vpar: expand anchors
* anchorLong
```bash
BASE_DIR=$HOME/data/anchr/Vpar
cd ${BASE_DIR}
anchr cover \
--parallel 16 \
-c 2 -m 40 \
doc/bacteria_2_3.md view on Meta::CPAN
-p Lpne -d canu-raw-80x \
gnuplot=$(brew --prefix)/Cellar/$(brew list --versions gnuplot | sed 's/ /\//')/bin/gnuplot \
genomeSize=3.4m \
-pacbio-raw 3_pacbio/pacbio.80x.fasta
faops n50 -S -C canu-raw-40x/Lpne.trimmedReads.fasta.gz
faops n50 -S -C canu-raw-80x/Lpne.trimmedReads.fasta.gz
```
## Lpne: expand anchors
* anchorLong
```bash
BASE_DIR=$HOME/data/anchr/Lpne
cd ${BASE_DIR}
anchr cover \
--parallel 16 \
-c 2 -m 40 \
doc/bacteria_2_3.md view on Meta::CPAN
-p Ngon -d canu-raw-80x \
gnuplot=$(brew --prefix)/Cellar/$(brew list --versions gnuplot | sed 's/ /\//')/bin/gnuplot \
genomeSize=2.3m \
-pacbio-raw 3_pacbio/pacbio.80x.fasta
faops n50 -S -C canu-raw-40x/Ngon.trimmedReads.fasta.gz
faops n50 -S -C canu-raw-80x/Ngon.trimmedReads.fasta.gz
```
## Ngon: expand anchors
* anchorLong
```bash
BASE_DIR=$HOME/data/anchr/Ngon
cd ${BASE_DIR}
anchr cover \
--parallel 16 \
-c 2 -m 40 \
doc/bacteria_2_3.md view on Meta::CPAN
-p Nmen -d canu-raw-80x \
gnuplot=$(brew --prefix)/Cellar/$(brew list --versions gnuplot | sed 's/ /\//')/bin/gnuplot \
genomeSize=2.3m \
-pacbio-raw 3_pacbio/pacbio.80x.fasta
faops n50 -S -C canu-raw-40x/Nmen.trimmedReads.fasta.gz
faops n50 -S -C canu-raw-80x/Nmen.trimmedReads.fasta.gz
```
## Nmen: expand anchors
* anchorLong
```bash
BASE_NAME=Nmen
cd ${HOME}/data/anchr/${BASE_NAME}
anchr cover \
--parallel 16 \
-c 2 -m 40 \
doc/bacteria_2_3.md view on Meta::CPAN
-p ${BASE_NAME} -d canu-raw-80x \
gnuplot=$(brew --prefix)/Cellar/$(brew list --versions gnuplot | sed 's/ /\//')/bin/gnuplot \
genomeSize=${REAL_G} \
-pacbio-raw 3_pacbio/pacbio.80x.fasta
faops n50 -S -C canu-raw-40x/${BASE_NAME}.trimmedReads.fasta.gz
faops n50 -S -C canu-raw-80x/${BASE_NAME}.trimmedReads.fasta.gz
```
## Cdip: expand anchors
* anchorLong
```bash
BASE_NAME=Cdip
cd ${HOME}/data/anchr/${BASE_NAME}
anchr cover \
--parallel 16 \
-c 2 -m 40 \
doc/bacteria_2_3.md view on Meta::CPAN
-p ${BASE_NAME} -d canu-raw-80x \
gnuplot=$(brew --prefix)/Cellar/$(brew list --versions gnuplot | sed 's/ /\//')/bin/gnuplot \
genomeSize=${GENOME_SIZE} \
-pacbio-raw 3_pacbio/pacbio.80x.fasta
faops n50 -S -C canu-raw-40x/${BASE_NAME}.trimmedReads.fasta.gz
faops n50 -S -C canu-raw-80x/${BASE_NAME}.trimmedReads.fasta.gz
```
## Ftul: expand anchors
* anchorLong
```bash
BASE_NAME=Ftul
cd ${HOME}/data/anchr/${BASE_NAME}
anchr cover \
--parallel 16 \
-c 2 -m 40 \
doc/e_coli.md view on Meta::CPAN
- [Quorum](#quorum)
- [Down sampling](#down-sampling)
- [Generate k-unitigs (sampled)](#generate-k-unitigs-sampled)
- [Create anchors (sampled)](#create-anchors-sampled)
- [Merge anchors with Qxx, Lxx and QxxLxx](#merge-anchors-with-qxx-lxx-and-qxxlxx)
- [Merge anchors](#merge-anchors)
- [Scaffolding with PE](#scaffolding-with-pe)
- [Different K values](#different-k-values)
- [3GS](#3gs)
- [Local corrections](#local-corrections)
- [Expand anchors](#expand-anchors)
- [Final stats](#final-stats)
# More tools on downloading and preprocessing data
## Extra external executables
```bash
brew install aria2 curl # downloading tools
doc/masurca.md view on Meta::CPAN
âââ createFastaSuperReadSequences
âââ createKUnitigMaxOverlaps
âââ create_k_unitigs_large_k
âââ create_k_unitigs_large_k2
âââ create_sr_frg
âââ create_sr_frg.pl
âââ createSuperReadSequenceAndPlacementFileFromCombined.perl
âââ createSuperReadsForDirectory.perl
âââ eliminateBadSuperReadsUsingList
âââ error_corrected2frg
âââ expand_fastq
âââ extendSuperReadsBasedOnUniqueExtensions
âââ extendSuperReadsForUniqueKmerNeighbors
âââ extractJoinableAndNextPassReadsFromJoinKUnitigs.perl
âââ extractreads_not.pl
âââ extractreads.pl
âââ extract_unjoined_pairs.pl
âââ fasta2frg_m.pl
âââ fasta2frg.pl
âââ filter_alt.pl
âââ filter_library.sh
doc/model_organisms.md view on Meta::CPAN
- [s288c: reads stats](#s288c-reads-stats)
- [s288c: spades](#s288c-spades)
- [s288c: platanus](#s288c-platanus)
- [s288c: quorum](#s288c-quorum)
- [s288c: down sampling](#s288c-down-sampling)
- [s288c: k-unitigs and anchors (sampled)](#s288c-k-unitigs-and-anchors-sampled)
- [s288c: merge anchors with Qxx and QxxL60Xxx](#s288c-merge-anchors-with-qxx-and-qxxl60xxx)
- [s288c: merge anchors](#s288c-merge-anchors)
- [s288c: 3GS](#s288c-3gs)
- [s288c: local corrections](#s288c-local-corrections)
- [s288c: expand anchors](#s288c-expand-anchors)
- [s288c: final stats](#s288c-final-stats)
- [*Drosophila melanogaster* iso-1](#drosophila-melanogaster-iso-1)
- [iso_1: download](#iso-1-download)
- [iso_1: preprocess Illumina reads](#iso-1-preprocess-illumina-reads)
- [iso_1: preprocess PacBio reads](#iso-1-preprocess-pacbio-reads)
- [iso_1: reads stats](#iso-1-reads-stats)
- [iso_1: spades](#iso-1-spades)
- [iso_1: platanus](#iso-1-platanus)
- [iso_1: quorum](#iso-1-quorum)
- [iso_1: down sampling](#iso-1-down-sampling)
- [iso_1: k-unitigs and anchors (sampled)](#iso-1-k-unitigs-and-anchors-sampled)
- [iso_1: merge anchors](#iso-1-merge-anchors)
- [iso_1: 3GS](#iso-1-3gs)
- [iso_1: expand anchors](#iso-1-expand-anchors)
- [iso_1: final stats](#iso-1-final-stats)
- [*Caenorhabditis elegans* N2](#caenorhabditis-elegans-n2)
- [n2: download](#n2-download)
- [n2: preprocess Illumina reads](#n2-preprocess-illumina-reads)
- [n2: preprocess PacBio reads](#n2-preprocess-pacbio-reads)
- [n2: reads stats](#n2-reads-stats)
- [n2: spades](#n2-spades)
- [n2: platanus](#n2-platanus)
- [n2: quorum](#n2-quorum)
- [n2: down sampling](#n2-down-sampling)
- [n2: k-unitigs and anchors (sampled)](#n2-k-unitigs-and-anchors-sampled)
- [n2: merge anchors](#n2-merge-anchors)
- [n2: 3GS](#n2-3gs)
- [n2: expand anchors](#n2-expand-anchors)
- [n2: final stats](#n2-final-stats)
- [*Arabidopsis thaliana* Col-0](#arabidopsis-thaliana-col-0)
- [col_0: download](#col-0-download)
- [col_0: preprocess Illumina reads](#col-0-preprocess-illumina-reads)
- [col_0: preprocess PacBio reads](#col-0-preprocess-pacbio-reads)
- [col_0: reads stats](#col-0-reads-stats)
- [col_0: spades](#col-0-spades)
- [col_0: platanus](#col-0-platanus)
- [col_0: quorum](#col-0-quorum)
- [col_0: down sampling](#col-0-down-sampling)
- [col_0: k-unitigs and anchors (sampled)](#col-0-k-unitigs-and-anchors-sampled)
- [col_0: merge anchors](#col-0-merge-anchors)
- [col_0: 3GS](#col-0-3gs)
- [col_0: expand anchors](#col-0-expand-anchors)
- [col_0: final stats](#col-0-final-stats)
# *Saccharomyces cerevisiae* S288c
* Genome: [Ensembl 82](http://sep2015.archive.ensembl.org/Saccharomyces_cerevisiae/Info/Index)
* Proportion of paralogs (> 1000 bp): 0.058
## s288c: download
doc/model_organisms.md view on Meta::CPAN
localCor/localCorRaw/${BASE_NAME}.contigs.fasta \
localCor/localCorIndep/${BASE_NAME}.contigs.fasta \
1_genome/paralogs.fas \
--label "anchor,localCor,localCorRaw,localCorIndep,paralogs" \
-o 9_qa_localCor
find . -type d -name "correction" | xargs rm -fr
```
## s288c: expand anchors
å¨é
¿é
é
µæ¯ä¸, æä¸åå ç»å®å
¨ç¸åçåºå, å®ä»¬é½æ¯æ°è¿åçççæ®µéå¤:
* I:216563-218385, VIII:537165-538987
* I:223713-224783, VIII:550350-551420
* IV:528442-530427, IV:532327-534312, IV:536212-538197
* IV:530324-531519, IV:534209-535404
* IV:5645-7725, X:738076-740156
* IV:7810-9432, X:736368-737990
* IX:9683-11043, X:9666-11026
doc/model_organisms.md view on Meta::CPAN
| minidot - > canu-raw-40x/minidot.eps
minimap canu-trim-40x/${BASE_NAME}.contigs.fasta 1_genome/genome.fa \
| minidot - > canu-trim-40x/minidot.eps
faops n50 -S -C canu-raw-40x/${BASE_NAME}.trimmedReads.fasta.gz
faops n50 -S -C canu-trim-40x/${BASE_NAME}.trimmedReads.fasta.gz
```
## iso_1: expand anchors
* anchorLong
```bash
BASE_NAME=iso_1
cd ${HOME}/data/anchr/${BASE_NAME}
rm -fr anchorLong
anchr overlap2 \
--parallel 16 \
doc/model_organisms.md view on Meta::CPAN
minimap canu-trim-40x/${BASE_NAME}.contigs.fasta 1_genome/genome.fa \
| minidot - > canu-trim-40x/minidot.eps
faops n50 -S -C canu-raw-40x/${BASE_NAME}.trimmedReads.fasta.gz
faops n50 -S -C canu-trim-40x/${BASE_NAME}.trimmedReads.fasta.gz
faops n50 -S -C canu-raw-80x/${BASE_NAME}.trimmedReads.fasta.gz
faops n50 -S -C canu-trim-80x/${BASE_NAME}.trimmedReads.fasta.gz
```
## n2: expand anchors
* anchorLong
```bash
BASE_NAME=n2
cd ${HOME}/data/anchr/${BASE_NAME}
anchr cover \
--parallel 16 \
-c 2 -m 40 \
doc/model_organisms.md view on Meta::CPAN
faops n50 -S -C canu-raw-40x/${BASE_NAME}.trimmedReads.fasta.gz
faops n50 -S -C canu-raw-80x/${BASE_NAME}.trimmedReads.fasta.gz
faops n50 -S -C canu-trim-80x/${BASE_NAME}.trimmedReads.fasta.gz
rm -fr canu-raw-40x/correction
rm -fr canu-raw-80x/correction
rm -fr canu-trim-80x/correction
```
## col_0: expand anchors
* anchorLong
```bash
BASE_NAME=col_0
cd ${HOME}/data/anchr/${BASE_NAME}
anchr cover \
--parallel 16 \
-c 2 -m 40 \
lib/App/Anchr/Command/quorum.pm view on Meta::CPAN
#----------------------------#
# Renaming reads
#----------------------------#
log_info 'Processing pe and/or se library reads'
rm -rf meanAndStdevByPrefix.pe.txt
echo 'pe [% opt.size %] [% opt.std %]' >> meanAndStdevByPrefix.pe.txt
if [ ! -e pe.renamed.fastq ]; then
rename_filter_fastq \
'pe' \
<(exec expand_fastq '[% args.0 %]' ) \
<(exec expand_fastq '[% args.1 %]' ) \
> 'pe.renamed.fastq'
fi
[% IF args.2 -%]
echo 'se [% opt.size %] [% opt.std %]' >> meanAndStdevByPrefix.pe.txt
if [ ! -e se.renamed.fastq ]; then
rename_filter_fastq \
'se' \
<(exec expand_fastq '[% args.2 %]' ) \
'' \
> 'se.renamed.fastq'
fi
[% END -%]
#----------------------------#
# Stats of PE reads
#----------------------------#
head -n 80000 pe.renamed.fastq > pe_data.tmp
export PE_AVG_READ_LENGTH=$(