App-Anchr

 view release on metacpan or  search on metacpan

doc/bacteria_2_3.md  view on Meta::CPAN


cat stat.md
```

| Name     |     N50 |        Sum |        # |
|:---------|--------:|-----------:|---------:|
| Genome   | 3288558 |    5165770 |        2 |
| Paralogs |    3333 |     155714 |       62 |
| Illumina |     101 | 1368727962 | 13551762 |
| PacBio   |   11771 | 1228497092 |   143537 |
| uniq     |     101 | 1361783404 | 13483004 |
| scythe   |     101 | 1346787728 | 13483004 |
| Q20L60   |     101 | 1264469138 | 12611522 |
| Q25L60   |     101 | 1200269501 | 12011552 |
| Q30L60   |     101 | 1080002384 | 10917028 |

## Vpar: down sampling

```bash
BASE_DIR=$HOME/data/anchr/Vpar
cd ${BASE_DIR}

doc/bacteria_2_3.md  view on Meta::CPAN

```

| Name           |  SumCor | CovCor | N50SR |   Sum |  # | N50Anchor |   Sum |  # | N50Others |    Sum | # |                Kmer | RunTimeKU | RunTimeAN |
|:---------------|--------:|-------:|------:|------:|---:|----------:|------:|---:|----------:|-------:|--:|--------------------:|----------:|:----------|
| Q25L60X40P000  |  75.71M |   40.0 | 35248 |  1.8M | 72 |     35248 |  1.8M | 71 |       865 |    865 | 1 | "31,41,51,61,71,81" | 0:01'13'' | 0:00'52'' |
| Q25L60X40P001  |  75.71M |   40.0 | 32751 |  1.8M | 75 |     32751 | 1.79M | 72 |      4293 |  9.43K | 3 | "31,41,51,61,71,81" | 0:01'15'' | 0:00'52'' |
| Q25L60X40P002  |  75.71M |   40.0 | 32751 |  1.8M | 76 |     32751 | 1.79M | 73 |      4293 |  9.45K | 3 | "31,41,51,61,71,81" | 0:01'13'' | 0:00'54'' |
| Q25L60X40P003  |  75.71M |   40.0 | 32751 | 1.82M | 75 |     32803 | 1.77M | 72 |     23232 | 47.32K | 3 | "31,41,51,61,71,81" | 0:01'14'' | 0:00'47'' |
| Q25L60X80P000  | 151.42M |   80.0 | 32751 |  1.8M | 78 |     32751 |  1.8M | 74 |       645 |  2.58K | 4 | "31,41,51,61,71,81" | 0:01'48'' | 0:01'09'' |
| Q25L60X80P001  | 151.42M |   80.0 | 31667 |  1.8M | 79 |     31667 |  1.8M | 77 |       865 |  1.44K | 2 | "31,41,51,61,71,81" | 0:01'49'' | 0:01'12'' |
| Q25L60X120P000 | 227.13M |  120.0 | 32404 |  1.8M | 83 |     32404 |  1.8M | 78 |       650 |  3.55K | 5 | "31,41,51,61,71,81" | 0:02'27'' | 0:01'21'' |
| Q25L60X160P000 | 302.84M |  160.0 | 31667 |  1.8M | 84 |     31667 |  1.8M | 83 |       865 |    865 | 1 | "31,41,51,61,71,81" | 0:03'05'' | 0:01'42'' |
| Q30L60X40P000  |  75.71M |   40.0 | 35248 |  1.8M | 72 |     35248 |  1.8M | 71 |       855 |    855 | 1 | "31,41,51,61,71,81" | 0:01'14'' | 0:00'56'' |
| Q30L60X40P001  |  75.71M |   40.0 | 32751 | 1.84M | 76 |     32813 | 1.76M | 71 |     32374 | 74.19K | 5 | "31,41,51,61,71,81" | 0:01'13'' | 0:00'45'' |
| Q30L60X40P002  |  75.71M |   40.0 | 32751 |  1.8M | 73 |     32751 |  1.8M | 72 |       855 |    855 | 1 | "31,41,51,61,71,81" | 0:01'13'' | 0:00'44'' |
| Q30L60X40P003  |  75.71M |   40.0 | 32741 |  1.8M | 75 |     32741 |  1.8M | 74 |       865 |    865 | 1 | "31,41,51,61,71,81" | 0:01'13'' | 0:00'45'' |
| Q30L60X80P000  | 151.42M |   80.0 | 32751 |  1.8M | 74 |     32751 |  1.8M | 73 |       865 |    865 | 1 | "31,41,51,61,71,81" | 0:01'49'' | 0:01'08'' |
| Q30L60X80P001  | 151.42M |   80.0 | 32751 |  1.8M | 74 |     32751 |  1.8M | 73 |       865 |    865 | 1 | "31,41,51,61,71,81" | 0:01'50'' | 0:01'12'' |
| Q30L60X120P000 | 227.13M |  120.0 | 32751 |  1.8M | 77 |     32751 |  1.8M | 75 |       865 |  1.49K | 2 | "31,41,51,61,71,81" | 0:02'26'' | 0:01'32'' |
| Q30L60X160P000 | 302.84M |  160.0 | 32404 |  1.8M | 79 |     32404 |  1.8M | 77 |       865 |  1.49K | 2 | "31,41,51,61,71,81" | 0:03'00'' | 0:01'37'' |

## Ftul: merge anchors

```bash
BASE_NAME=Ftul
cd ${HOME}/data/anchr/${BASE_NAME}

# merge anchors
mkdir -p merge
anchr contained \

doc/bacteria_2_3.md  view on Meta::CPAN

    $(echo "contigTrim"; faops n50 -H -S -C contigTrim/contig.fasta;) >> stat3.md

cat stat3.md
```

| Name         |     N50 |     Sum |  # |
|:-------------|--------:|--------:|---:|
| Genome       | 1892775 | 1892775 |  1 |
| Paralogs     |   33912 |   93531 | 10 |
| anchor.merge |   32813 | 1801122 | 73 |
| others.merge |   32404 |   64274 |  3 |
| anchor.cover |   32813 | 1796007 | 71 |
| anchorLong   |   35248 | 1795927 | 70 |
| contigTrim   | 1027458 | 1856949 |  4 |

* Clear QxxLxxXxx.

```bash
BASE_NAME=Ftul
cd ${HOME}/data/anchr/${BASE_NAME}

doc/e_coli.md  view on Meta::CPAN

```bash
mkdir -p ~/share/SMRTAnalysis_2.3.0
cd ~/share/SMRTAnalysis_2.3.0

aria2c -x 9 -s 3 -c http://files.pacb.com/software/smrtanalysis/2.3.0/smrtanalysis_2.3.0.140936.run
aria2c -x 9 -s 3 -c http://files.pacb.com/software/smrtanalysis/2.3.0/smrtanalysis-patch_2.3.0.140936.p5.run

aria2c -x 9 -s 3 -c https://atlas.hashicorp.com/ubuntu/boxes/trusty64/versions/20170313.0.7/providers/virtualbox.box
vagrant box add ubuntu/trusty64 trusty-server-cloudimg-amd64-vagrant-disk1.box --force

curl -O https://raw.githubusercontent.com/mhsieh/SMRTAnalysis_2.3.0_install/master/vagrant-u1404/Vagrantfile

vagrant destroy -f
rm -fr .vagrant/
vagrant up --provider virtualbox

```

# *Escherichia coli* str. K-12 substr. MG1655

* Genome: INSDC

doc/e_coli.md  view on Meta::CPAN

| Q25L30          |     151 | 1382782641 | 10841386 |
| Q25L60          |     151 | 1317617346 |  9994728 |
| Q25L90          |     151 | 1177142378 |  8586574 |
| Q25L120         |     151 |  837111446 |  5805874 |
| Q30L30          |     125 | 1192536117 | 10716954 |
| Q30L60          |     127 | 1149107745 |  9783292 |
| Q30L90          |     130 | 1021609911 |  8105773 |
| Q30L120         |     139 |  693661043 |  5002158 |
| Q35L30          |      64 |  588252718 |  9588363 |
| Q35L60          |      72 |  366922898 |  5062192 |
| Q35L90          |      95 |   35259773 |   364046 |
| Q35L120         |     124 |     647353 |     5169 |
| PacBio          |   13982 |  748508361 |    87225 |
| PacBio.trim     |   13630 |  688575670 |    77687 |
| PacBio.20x      |   13962 |   99252919 |    11500 |
| PacBio.20x.trim |   13541 |   88697009 |     9980 |
| PacBio.40x      |   13948 |  198650072 |    23000 |
| PacBio.40x.trim |   13565 |  179462005 |    20137 |
| PacBio.80x      |   13996 |  395094712 |    46000 |
| PacBio.80x.trim |   13608 |  360190363 |    40682 |

doc/masurca.md  view on Meta::CPAN

            | perl -n -e '/ESTIMATED_GENOME_SIZE=\"(\d+)\"/ and print $1' )
done >> stat.md

cat stat.md
```

| name          | N50SR |  #SR | N50Contig | #Contig | N50Scaffold | #Scaffold |    EstG |
|:--------------|------:|-----:|----------:|--------:|------------:|----------:|--------:|
| PE_SJ_Sanger4 |  4586 | 4187 |    205225 |      69 |     3196849 |        35 | 4602968 |
| PE_SJ_Sanger  |  4586 | 4187 |     63274 |     141 |     3070846 |        28 | 4602968 |
| PE_SJ         |  4586 | 4187 |     43125 |     219 |     3058404 |        59 | 4602968 |
| PE_Sanger4    |  4705 | 4042 |    125228 |      67 |      534852 |        30 | 4595684 |
| PE_Sanger     |  4705 | 4042 |     19435 |     412 |       21957 |       359 | 4595684 |
| PE            |  4705 | 4043 |     20826 |     407 |       34421 |       278 | 4595684 |
| superreads    |  4705 | 4043 |           |         |             |           | 4595684 |

有足够多的 long reads 支持下, 不需要 short jump.

# SuperReads 3.1.3

2017 年 2 月, UMD ftp 上多了一个新程序
[SuperReads_RNA](ftp://ftp.genome.umd.edu/pub/MaSuRCA/beta/SuperReads_RNA-1.0.1.tar.gz), 是 MaSuRCA
3.2.1 的简化版. 很可能是 `StringTie` 用了 super-reads 来处理 RNA-seq, 在很多人的要求下做的.

根据这个版本, 我将 MaSuRCA 3.1.3 简化, 去掉所有的依赖, 去掉配合 `Celera Assembler` 的部分, 只留下了

doc/model_organisms.md  view on Meta::CPAN


```

## s288c: expand anchors

在酿酒酵母中, 有下列几组完全相同的序列, 它们都是新近发生的片段重复:

* I:216563-218385, VIII:537165-538987
* I:223713-224783, VIII:550350-551420
* IV:528442-530427, IV:532327-534312, IV:536212-538197
* IV:530324-531519, IV:534209-535404
* IV:5645-7725, X:738076-740156
* IV:7810-9432, X:736368-737990
* IX:9683-11043, X:9666-11026
* IV:1244112-1245373, XV:575980-577241
* VIII:212266-214124, VIII:214264-216122
* IX:11366-14953, X:11349-14936
* XII:468935-470576, XII:472587-474228, XII:482167-483808, XII:485819-487460,
* XII:483798-485798, XII:487450-489450

* anchorLong

doc/model_organisms.md  view on Meta::CPAN

| Genome            | 25286936 | 137567477 |      8 |
| Paralogs          |     4031 |  13665900 |   4492 |
| anchor.merge      |    26860 | 117041459 |   9566 |
| others.merge      |     8732 |   3092289 |   1004 |
| anchor.cover      |    26199 | 116199529 |   9576 |
| anchorLong        |    69814 | 115806088 |   4924 |
| contigTrim        |  1238480 | 123572499 |    603 |
| spades.contig     |   108756 | 132705321 |  61620 |
| spades.scaffold   |   142273 | 132725706 |  61182 |
| platanus.contig   |    11503 | 156820565 | 359399 |
| platanus.scaffold |   146404 | 129134232 |  71416 |

* quast

```bash
BASE_NAME=iso_1
cd ${HOME}/data/anchr/${BASE_NAME}

rm -fr 9_qa_contig
quast --no-check --threads 16 \
    --eukaryote \

doc/pacbio_consensus.md  view on Meta::CPAN

    ```

## `falcon/example` 里的 [*E. coli* 样例](https://github.com/PacificBiosciences/FALCON/wiki/Setup:-Complete-example).

* 过墙下载以下三个文件

```bash
mkdir -p $HOME/data/pacbio/rawdata/ecoli_test
cd $HOME/data/pacbio/rawdata/ecoli_test

proxychains4 wget -c https://www.dropbox.com/s/tb78i5i3nrvm6rg/m140913_050931_42139_c100713652400000001823152404301535_s1_p0.1.subreads.fasta
proxychains4 wget -c https://www.dropbox.com/s/v6wwpn40gedj470/m140913_050931_42139_c100713652400000001823152404301535_s1_p0.2.subreads.fasta
proxychains4 wget -c https://www.dropbox.com/s/j61j2cvdxn4dx4g/m140913_050931_42139_c100713652400000001823152404301535_s1_p0.3.subreads.fasta

# N50 14124
# C   105451
faops n50 -C *.subreads.fasta
```

* 配置文件及运行

```bash
source ~/share/pitchfork/deployment/setup-env.sh



( run in 1.954 second using v1.01-cache-2.11-cpan-39bf76dae61 )