App-Anchr

 view release on metacpan or  search on metacpan

doc/bacteria_2_3.md  view on Meta::CPAN

    - [lambda: 3GS](#lambda-3gs)
- [Shigella flexneri NCTC0001, 福氏志贺氏菌](#shigella-flexneri-nctc0001-福氏志贺氏菌)
    - [Sfle: download](#sfle-download)
    - [Sfle: combinations of different quality values and read lengths](#sfle-combinations-of-different-quality-values-and-read-lengths)
    - [Sfle: down sampling](#sfle-down-sampling)
    - [Sfle: generate super-reads](#sfle-generate-super-reads)
    - [Sfle: create anchors](#sfle-create-anchors)
    - [Sfle: results](#sfle-results)
    - [Sfle: merge anchors](#sfle-merge-anchors)
    - [Sfle: 3GS](#sfle-3gs)
    - [Sfle: expand anchors](#sfle-expand-anchors)
- [Vibrio parahaemolyticus ATCC BAA-239, 副溶血弧菌](#vibrio-parahaemolyticus-atcc-baa-239-副溶血弧菌)
    - [Vpar: download](#vpar-download)
    - [Vpar: combinations of different quality values and read lengths](#vpar-combinations-of-different-quality-values-and-read-lengths)
    - [Vpar: down sampling](#vpar-down-sampling)
    - [Vpar: generate super-reads](#vpar-generate-super-reads)
    - [Vpar: create anchors](#vpar-create-anchors)
    - [Vpar: results](#vpar-results)
    - [Vpar: merge anchors](#vpar-merge-anchors)
    - [Vpar: 3GS](#vpar-3gs)
    - [Vpar: expand anchors](#vpar-expand-anchors)
- [Legionella pneumophila subsp. pneumophila ATCC 33152D-5; Philadelphia-1, 嗜肺军团菌](#legionella-pneumophila-subsp-pneumophila-atcc-33152d-5-philadelphia-1-嗜肺军团菌)
    - [Lpne: download](#lpne-download)
    - [Lpne: combinations of different quality values and read lengths](#lpne-combinations-of-different-quality-values-and-read-lengths)
    - [Lpne: down sampling](#lpne-down-sampling)
    - [Lpne: generate super-reads](#lpne-generate-super-reads)
    - [Lpne: create anchors](#lpne-create-anchors)
    - [Lpne: results](#lpne-results)
    - [Lpne: merge anchors](#lpne-merge-anchors)
    - [Lpne: 3GS](#lpne-3gs)
    - [Lpne: expand anchors](#lpne-expand-anchors)
- [Neisseria gonorrhoeae FDAARGOS_207, 淋病奈瑟氏菌](#neisseria-gonorrhoeae-fdaargos-207-淋病奈瑟氏菌)
    - [Ngon: download](#ngon-download)
    - [Ngon: combinations of different quality values and read lengths](#ngon-combinations-of-different-quality-values-and-read-lengths)
    - [Ngon: down sampling](#ngon-down-sampling)
    - [Ngon: generate super-reads](#ngon-generate-super-reads)
    - [Ngon: create anchors](#ngon-create-anchors)
    - [Ngon: results](#ngon-results)
    - [Ngon: merge anchors](#ngon-merge-anchors)
    - [Ngon: 3GS](#ngon-3gs)
    - [Ngon: expand anchors](#ngon-expand-anchors)
- [Neisseria meningitidis FDAARGOS_209, 脑膜炎奈瑟氏菌](#neisseria-meningitidis-fdaargos-209-脑膜炎奈瑟氏菌)
    - [Nmen: download](#nmen-download)
    - [Nmen: combinations of different quality values and read lengths](#nmen-combinations-of-different-quality-values-and-read-lengths)
    - [Nmen: quorum](#nmen-quorum)
    - [Nmen: down sampling](#nmen-down-sampling)
    - [Nmen: k-unitigs and anchors (sampled)](#nmen-k-unitigs-and-anchors-sampled)
    - [Nmen: merge anchors](#nmen-merge-anchors)
    - [Nmen: 3GS](#nmen-3gs)
    - [Nmen: expand anchors](#nmen-expand-anchors)
- [Bordetella pertussis FDAARGOS_195, 百日咳博德特氏杆菌](#bordetella-pertussis-fdaargos-195-百日咳博德特氏杆菌)
    - [Bper: download](#bper-download)
    - [Bper: combinations of different quality values and read lengths](#bper-combinations-of-different-quality-values-and-read-lengths)
    - [Bper: down sampling](#bper-down-sampling)
    - [Bper: generate super-reads](#bper-generate-super-reads)
    - [Bper: create anchors](#bper-create-anchors)
    - [Bper: results](#bper-results)
    - [Bper: merge anchors](#bper-merge-anchors)
- [Corynebacterium diphtheriae FDAARGOS_197, 白喉杆菌](#corynebacterium-diphtheriae-fdaargos-197-白喉杆菌)
    - [Cdip: download](#cdip-download)
    - [Cdip: combinations of different quality values and read lengths](#cdip-combinations-of-different-quality-values-and-read-lengths)
    - [Cdip: quorum](#cdip-quorum)
    - [Cdip: down sampling](#cdip-down-sampling)
    - [Cdip: k-unitigs and anchors (sampled)](#cdip-k-unitigs-and-anchors-sampled)
    - [Cdip: merge anchors](#cdip-merge-anchors)
    - [Cdip: 3GS](#cdip-3gs)
    - [Cdip: expand anchors](#cdip-expand-anchors)
- [Francisella tularensis FDAARGOS_247, 土拉热弗朗西斯氏菌](#francisella-tularensis-fdaargos-247-土拉热弗朗西斯氏菌)
    - [Ftul: download](#ftul-download)
    - [Ftul: combinations of different quality values and read lengths](#ftul-combinations-of-different-quality-values-and-read-lengths)
    - [Ftul: quorum](#ftul-quorum)
    - [Ftul: down sampling](#ftul-down-sampling)
    - [Ftul: k-unitigs and anchors (sampled)](#ftul-k-unitigs-and-anchors-sampled)
    - [Ftul: merge anchors](#ftul-merge-anchors)
    - [Ftul: 3GS](#ftul-3gs)
    - [Ftul: expand anchors](#ftul-expand-anchors)
- [Haemophilus influenzae FDAARGOS_199, 流感嗜血杆菌](#haemophilus-influenzae-fdaargos-199-流感嗜血杆菌)
    - [Hinf: download](#hinf-download)
- [Listeria monocytogenes FDAARGOS_351, 单核细胞增生李斯特氏菌](#listeria-monocytogenes-fdaargos-351-单核细胞增生李斯特氏菌)
    - [Lmon: download](#lmon-download)
- [Clostridioides difficile 630](#clostridioides-difficile-630)
    - [Cdif: download](#cdif-download)
- [Campylobacter jejuni subsp. jejuni ATCC 700819, 空肠弯曲杆菌](#campylobacter-jejuni-subsp-jejuni-atcc-700819-空肠弯曲杆菌)
    - [Cjej: download](#cjej-download)


doc/bacteria_2_3.md  view on Meta::CPAN

    -p Sfle -d canu-raw-80x \
    gnuplot=$(brew --prefix)/Cellar/$(brew list --versions gnuplot | sed 's/ /\//')/bin/gnuplot \
    genomeSize=4.8m \
    -pacbio-raw 3_pacbio/pacbio.80x.fasta

faops n50 -S -C canu-raw-40x/Sfle.trimmedReads.fasta.gz
faops n50 -S -C canu-raw-80x/Sfle.trimmedReads.fasta.gz

```

## Sfle: expand anchors

* anchorLong

```bash
BASE_DIR=$HOME/data/anchr/Sfle
cd ${BASE_DIR}

anchr cover \
    --parallel 16 \
    -c 2 -m 40 \

doc/bacteria_2_3.md  view on Meta::CPAN

    -p Vpar -d canu-raw-80x \
    gnuplot=$(brew --prefix)/Cellar/$(brew list --versions gnuplot | sed 's/ /\//')/bin/gnuplot \
    genomeSize=5.2m \
    -pacbio-raw 3_pacbio/pacbio.80x.fasta

faops n50 -S -C canu-raw-40x/Vpar.trimmedReads.fasta.gz
faops n50 -S -C canu-raw-80x/Vpar.trimmedReads.fasta.gz

```

## Vpar: expand anchors

* anchorLong

```bash
BASE_DIR=$HOME/data/anchr/Vpar
cd ${BASE_DIR}

anchr cover \
    --parallel 16 \
    -c 2 -m 40 \

doc/bacteria_2_3.md  view on Meta::CPAN

    -p Lpne -d canu-raw-80x \
    gnuplot=$(brew --prefix)/Cellar/$(brew list --versions gnuplot | sed 's/ /\//')/bin/gnuplot \
    genomeSize=3.4m \
    -pacbio-raw 3_pacbio/pacbio.80x.fasta

faops n50 -S -C canu-raw-40x/Lpne.trimmedReads.fasta.gz
faops n50 -S -C canu-raw-80x/Lpne.trimmedReads.fasta.gz

```

## Lpne: expand anchors

* anchorLong

```bash
BASE_DIR=$HOME/data/anchr/Lpne
cd ${BASE_DIR}

anchr cover \
    --parallel 16 \
    -c 2 -m 40 \

doc/bacteria_2_3.md  view on Meta::CPAN

    -p Ngon -d canu-raw-80x \
    gnuplot=$(brew --prefix)/Cellar/$(brew list --versions gnuplot | sed 's/ /\//')/bin/gnuplot \
    genomeSize=2.3m \
    -pacbio-raw 3_pacbio/pacbio.80x.fasta

faops n50 -S -C canu-raw-40x/Ngon.trimmedReads.fasta.gz
faops n50 -S -C canu-raw-80x/Ngon.trimmedReads.fasta.gz

```

## Ngon: expand anchors

* anchorLong

```bash
BASE_DIR=$HOME/data/anchr/Ngon
cd ${BASE_DIR}

anchr cover \
    --parallel 16 \
    -c 2 -m 40 \

doc/bacteria_2_3.md  view on Meta::CPAN

    -p Nmen -d canu-raw-80x \
    gnuplot=$(brew --prefix)/Cellar/$(brew list --versions gnuplot | sed 's/ /\//')/bin/gnuplot \
    genomeSize=2.3m \
    -pacbio-raw 3_pacbio/pacbio.80x.fasta

faops n50 -S -C canu-raw-40x/Nmen.trimmedReads.fasta.gz
faops n50 -S -C canu-raw-80x/Nmen.trimmedReads.fasta.gz

```

## Nmen: expand anchors

* anchorLong

```bash
BASE_NAME=Nmen
cd ${HOME}/data/anchr/${BASE_NAME}

anchr cover \
    --parallel 16 \
    -c 2 -m 40 \

doc/bacteria_2_3.md  view on Meta::CPAN

    -p ${BASE_NAME} -d canu-raw-80x \
    gnuplot=$(brew --prefix)/Cellar/$(brew list --versions gnuplot | sed 's/ /\//')/bin/gnuplot \
    genomeSize=${REAL_G} \
    -pacbio-raw 3_pacbio/pacbio.80x.fasta

faops n50 -S -C canu-raw-40x/${BASE_NAME}.trimmedReads.fasta.gz
faops n50 -S -C canu-raw-80x/${BASE_NAME}.trimmedReads.fasta.gz

```

## Cdip: expand anchors

* anchorLong

```bash
BASE_NAME=Cdip
cd ${HOME}/data/anchr/${BASE_NAME}

anchr cover \
    --parallel 16 \
    -c 2 -m 40 \

doc/bacteria_2_3.md  view on Meta::CPAN

    -p ${BASE_NAME} -d canu-raw-80x \
    gnuplot=$(brew --prefix)/Cellar/$(brew list --versions gnuplot | sed 's/ /\//')/bin/gnuplot \
    genomeSize=${GENOME_SIZE} \
    -pacbio-raw 3_pacbio/pacbio.80x.fasta

faops n50 -S -C canu-raw-40x/${BASE_NAME}.trimmedReads.fasta.gz
faops n50 -S -C canu-raw-80x/${BASE_NAME}.trimmedReads.fasta.gz

```

## Ftul: expand anchors

* anchorLong

```bash
BASE_NAME=Ftul
cd ${HOME}/data/anchr/${BASE_NAME}

anchr cover \
    --parallel 16 \
    -c 2 -m 40 \

doc/e_coli.md  view on Meta::CPAN

    - [Quorum](#quorum)
    - [Down sampling](#down-sampling)
    - [Generate k-unitigs (sampled)](#generate-k-unitigs-sampled)
    - [Create anchors (sampled)](#create-anchors-sampled)
    - [Merge anchors with Qxx, Lxx and QxxLxx](#merge-anchors-with-qxx-lxx-and-qxxlxx)
    - [Merge anchors](#merge-anchors)
    - [Scaffolding with PE](#scaffolding-with-pe)
    - [Different K values](#different-k-values)
    - [3GS](#3gs)
    - [Local corrections](#local-corrections)
    - [Expand anchors](#expand-anchors)
    - [Final stats](#final-stats)


# More tools on downloading and preprocessing data

## Extra external executables

```bash
brew install aria2 curl                     # downloading tools

doc/masurca.md  view on Meta::CPAN

├── createFastaSuperReadSequences
├── createKUnitigMaxOverlaps
├── create_k_unitigs_large_k
├── create_k_unitigs_large_k2
├── create_sr_frg
├── create_sr_frg.pl
├── createSuperReadSequenceAndPlacementFileFromCombined.perl
├── createSuperReadsForDirectory.perl
├── eliminateBadSuperReadsUsingList
├── error_corrected2frg
├── expand_fastq
├── extendSuperReadsBasedOnUniqueExtensions
├── extendSuperReadsForUniqueKmerNeighbors
├── extractJoinableAndNextPassReadsFromJoinKUnitigs.perl
├── extractreads_not.pl
├── extractreads.pl
├── extract_unjoined_pairs.pl
├── fasta2frg_m.pl
├── fasta2frg.pl
├── filter_alt.pl
├── filter_library.sh

doc/model_organisms.md  view on Meta::CPAN

    - [s288c: reads stats](#s288c-reads-stats)
    - [s288c: spades](#s288c-spades)
    - [s288c: platanus](#s288c-platanus)
    - [s288c: quorum](#s288c-quorum)
    - [s288c: down sampling](#s288c-down-sampling)
    - [s288c: k-unitigs and anchors (sampled)](#s288c-k-unitigs-and-anchors-sampled)
    - [s288c: merge anchors with Qxx and QxxL60Xxx](#s288c-merge-anchors-with-qxx-and-qxxl60xxx)
    - [s288c: merge anchors](#s288c-merge-anchors)
    - [s288c: 3GS](#s288c-3gs)
    - [s288c: local corrections](#s288c-local-corrections)
    - [s288c: expand anchors](#s288c-expand-anchors)
    - [s288c: final stats](#s288c-final-stats)
- [*Drosophila melanogaster* iso-1](#drosophila-melanogaster-iso-1)
    - [iso_1: download](#iso-1-download)
    - [iso_1: preprocess Illumina reads](#iso-1-preprocess-illumina-reads)
    - [iso_1: preprocess PacBio reads](#iso-1-preprocess-pacbio-reads)
    - [iso_1: reads stats](#iso-1-reads-stats)
    - [iso_1: spades](#iso-1-spades)
    - [iso_1: platanus](#iso-1-platanus)
    - [iso_1: quorum](#iso-1-quorum)
    - [iso_1: down sampling](#iso-1-down-sampling)
    - [iso_1: k-unitigs and anchors (sampled)](#iso-1-k-unitigs-and-anchors-sampled)
    - [iso_1: merge anchors](#iso-1-merge-anchors)
    - [iso_1: 3GS](#iso-1-3gs)
    - [iso_1: expand anchors](#iso-1-expand-anchors)
    - [iso_1: final stats](#iso-1-final-stats)
- [*Caenorhabditis elegans* N2](#caenorhabditis-elegans-n2)
    - [n2: download](#n2-download)
    - [n2: preprocess Illumina reads](#n2-preprocess-illumina-reads)
    - [n2: preprocess PacBio reads](#n2-preprocess-pacbio-reads)
    - [n2: reads stats](#n2-reads-stats)
    - [n2: spades](#n2-spades)
    - [n2: platanus](#n2-platanus)
    - [n2: quorum](#n2-quorum)
    - [n2: down sampling](#n2-down-sampling)
    - [n2: k-unitigs and anchors (sampled)](#n2-k-unitigs-and-anchors-sampled)
    - [n2: merge anchors](#n2-merge-anchors)
    - [n2: 3GS](#n2-3gs)
    - [n2: expand anchors](#n2-expand-anchors)
    - [n2: final stats](#n2-final-stats)
- [*Arabidopsis thaliana* Col-0](#arabidopsis-thaliana-col-0)
    - [col_0: download](#col-0-download)
    - [col_0: preprocess Illumina reads](#col-0-preprocess-illumina-reads)
    - [col_0: preprocess PacBio reads](#col-0-preprocess-pacbio-reads)
    - [col_0: reads stats](#col-0-reads-stats)
    - [col_0: spades](#col-0-spades)
    - [col_0: platanus](#col-0-platanus)
    - [col_0: quorum](#col-0-quorum)
    - [col_0: down sampling](#col-0-down-sampling)
    - [col_0: k-unitigs and anchors (sampled)](#col-0-k-unitigs-and-anchors-sampled)
    - [col_0: merge anchors](#col-0-merge-anchors)
    - [col_0: 3GS](#col-0-3gs)
    - [col_0: expand anchors](#col-0-expand-anchors)
    - [col_0: final stats](#col-0-final-stats)


# *Saccharomyces cerevisiae* S288c

* Genome: [Ensembl 82](http://sep2015.archive.ensembl.org/Saccharomyces_cerevisiae/Info/Index)
* Proportion of paralogs (> 1000 bp): 0.058

## s288c: download

doc/model_organisms.md  view on Meta::CPAN

    localCor/localCorRaw/${BASE_NAME}.contigs.fasta \
    localCor/localCorIndep/${BASE_NAME}.contigs.fasta \
    1_genome/paralogs.fas \
    --label "anchor,localCor,localCorRaw,localCorIndep,paralogs" \
    -o 9_qa_localCor

find . -type d -name "correction" | xargs rm -fr

```

## s288c: expand anchors

在酿酒酵母中, 有下列几组完全相同的序列, 它们都是新近发生的片段重复:

* I:216563-218385, VIII:537165-538987
* I:223713-224783, VIII:550350-551420
* IV:528442-530427, IV:532327-534312, IV:536212-538197
* IV:530324-531519, IV:534209-535404
* IV:5645-7725, X:738076-740156
* IV:7810-9432, X:736368-737990
* IX:9683-11043, X:9666-11026

doc/model_organisms.md  view on Meta::CPAN

    | minidot - > canu-raw-40x/minidot.eps

minimap canu-trim-40x/${BASE_NAME}.contigs.fasta 1_genome/genome.fa \
    | minidot - > canu-trim-40x/minidot.eps

faops n50 -S -C canu-raw-40x/${BASE_NAME}.trimmedReads.fasta.gz
faops n50 -S -C canu-trim-40x/${BASE_NAME}.trimmedReads.fasta.gz

```

## iso_1: expand anchors

* anchorLong

```bash
BASE_NAME=iso_1
cd ${HOME}/data/anchr/${BASE_NAME}

rm -fr anchorLong
anchr overlap2 \
    --parallel 16 \

doc/model_organisms.md  view on Meta::CPAN

minimap canu-trim-40x/${BASE_NAME}.contigs.fasta 1_genome/genome.fa \
    | minidot - > canu-trim-40x/minidot.eps

faops n50 -S -C canu-raw-40x/${BASE_NAME}.trimmedReads.fasta.gz
faops n50 -S -C canu-trim-40x/${BASE_NAME}.trimmedReads.fasta.gz
faops n50 -S -C canu-raw-80x/${BASE_NAME}.trimmedReads.fasta.gz
faops n50 -S -C canu-trim-80x/${BASE_NAME}.trimmedReads.fasta.gz

```

## n2: expand anchors

* anchorLong

```bash
BASE_NAME=n2
cd ${HOME}/data/anchr/${BASE_NAME}

anchr cover \
    --parallel 16 \
    -c 2 -m 40 \

doc/model_organisms.md  view on Meta::CPAN

faops n50 -S -C canu-raw-40x/${BASE_NAME}.trimmedReads.fasta.gz
faops n50 -S -C canu-raw-80x/${BASE_NAME}.trimmedReads.fasta.gz
faops n50 -S -C canu-trim-80x/${BASE_NAME}.trimmedReads.fasta.gz

rm -fr canu-raw-40x/correction
rm -fr canu-raw-80x/correction
rm -fr canu-trim-80x/correction

```

## col_0: expand anchors

* anchorLong

```bash
BASE_NAME=col_0
cd ${HOME}/data/anchr/${BASE_NAME}

anchr cover \
    --parallel 16 \
    -c 2 -m 40 \

lib/App/Anchr/Command/quorum.pm  view on Meta::CPAN

#----------------------------#
# Renaming reads
#----------------------------#
log_info 'Processing pe and/or se library reads'
rm -rf meanAndStdevByPrefix.pe.txt
echo 'pe [% opt.size %] [% opt.std %]' >> meanAndStdevByPrefix.pe.txt

if [ ! -e pe.renamed.fastq ]; then
    rename_filter_fastq \
        'pe' \
        <(exec expand_fastq '[% args.0 %]' ) \
        <(exec expand_fastq '[% args.1 %]' ) \
        > 'pe.renamed.fastq'
fi

[% IF args.2 -%]
echo 'se [% opt.size %] [% opt.std %]' >> meanAndStdevByPrefix.pe.txt

if [ ! -e se.renamed.fastq ]; then
    rename_filter_fastq \
        'se' \
        <(exec expand_fastq '[% args.2 %]' ) \
        '' \
        > 'se.renamed.fastq'
fi
[% END -%]

#----------------------------#
# Stats of PE reads
#----------------------------#
head -n 80000 pe.renamed.fastq > pe_data.tmp
export PE_AVG_READ_LENGTH=$(



( run in 0.621 second using v1.01-cache-2.11-cpan-97f6503c9c8 )