App-Anchr
view release on metacpan or search on metacpan
doc/pacbio_consensus.md view on Meta::CPAN
è¯è¿è¡.
```bash
source ~/share/pitchfork/deployment/setup-env.sh
quiver --help
```
## ç´æ¥å®è£
falcon-integrate, ç°å¨ä¸æ¨è
[wiki page](https://github.com/PacificBiosciences/FALCON-integrate/wiki/Installation)
```bash
mkdir -p $HOME/share
cd $HOME/share
git clone git://github.com/PacificBiosciences/FALCON-integrate.git
cd FALCON-integrate
git checkout master # or whatever version you want
make init
source env.sh
make config-edit-user
make -j all
# Test data stored in dropbox. f* gfw
# make test
```
ç¼è¯å®æå, ä¼çæ`fc_env`ç®å½, é颿¯å¯æ§è¡æä»¶. `tree -L 2 fc_env`, `6 directories, 79 files`.
# falcon æ ·ä¾æ°æ®
falcon-examples éçæ°æ®æ¯éè¿ä¸ä¸ªå°ä¼ç¨åº `git-sym` ä» dropbox ä¸è½½ç, å¨å¢å
æ æ³æè¯´ææä»¶éçæç¤ºæ¥ä½¿ç¨.
åæ¶å
¶å
çå¾å¤è®¾ç½®é½æ¯åæ»çé群路å¾, 以å sge é
ç½®, 大大å¢å äºå¤æåº¦, å¹¶è®©äººæ æ³çè§£.
注æ:
* fasta æä»¶ **å¿
é¡»** 以 `.fasta` 为æ©å±å
* fasta æä»¶ä¸çåºååç§°, å¿
须符å falcon (fasta2DB of dazz_db) çè¦æ±, å³ sra é»è®¤åç§°**ä¸ç¬¦åè¦æ±**,
é误æç¤ºä¸º `Pacbio header line format error`
* [è¿é](https://github.com/PacificBiosciences/FALCON/issues/251)æä¸ªèæ¬å¸®å©è§£å³è¿ä¸ªé®é¢. å·²ç»æ¾å°æ¬å°,
`falcon_name_fasta.pl`
* Clear intermediate dirs
```bash
find $HOME/data/pacbio -type d -name 'm_*' | xargs rm -fr
find $HOME/data/pacbio -type d -name 'job_*' | xargs rm -fr
```
## `falcon/example` éç [*E. coli* æ ·ä¾](https://github.com/PacificBiosciences/FALCON/wiki/Setup:-Complete-example).
* è¿å¢ä¸è½½ä»¥ä¸ä¸ä¸ªæä»¶
```bash
mkdir -p $HOME/data/pacbio/rawdata/ecoli_test
cd $HOME/data/pacbio/rawdata/ecoli_test
proxychains4 wget -c https://www.dropbox.com/s/tb78i5i3nrvm6rg/m140913_050931_42139_c100713652400000001823152404301535_s1_p0.1.subreads.fasta
proxychains4 wget -c https://www.dropbox.com/s/v6wwpn40gedj470/m140913_050931_42139_c100713652400000001823152404301535_s1_p0.2.subreads.fasta
proxychains4 wget -c https://www.dropbox.com/s/j61j2cvdxn4dx4g/m140913_050931_42139_c100713652400000001823152404301535_s1_p0.3.subreads.fasta
# N50 14124
# C 105451
faops n50 -C *.subreads.fasta
```
* é
ç½®æä»¶åè¿è¡
```bash
source ~/share/pitchfork/deployment/setup-env.sh
if [ -d $HOME/data/pacbio/ecoli_test ];
then
rm -fr $HOME/data/pacbio/ecoli_test
fi
mkdir -p $HOME/data/pacbio/ecoli_test
cd $HOME/data/pacbio/ecoli_test
find $HOME/data/pacbio/rawdata/ecoli_test -name "*.fasta" > input.fofn
# https://github.com/PacificBiosciences/FALCON/blob/master/examples/fc_run_ecoli.cfg
cat <<EOF > fc_run.cfg
[General]
job_type = local
# list of files of the initial bas.h5 files
input_fofn = input.fofn
input_type = raw
#input_type = preads
# The length cutoff used for seed reads used for initial mapping
length_cutoff = 12000
# The length cutoff used for seed reads used for pre-assembly
length_cutoff_pr = 12000
# Cluster queue setting
sge_option_da =
sge_option_la =
sge_option_pda =
sge_option_pla =
sge_option_fc =
sge_option_cns =
pa_concurrent_jobs = 4
ovlp_concurrent_jobs = 4
pa_HPCdaligner_option = -v -B4 -t16 -e.70 -l1000 -s1000
ovlp_HPCdaligner_option = -v -B4 -t32 -h60 -e.96 -l500 -s1000
pa_DBsplit_option = -x500 -s50
ovlp_DBsplit_option = -x500 -s50
falcon_sense_option = --output_multi --min_idt 0.70 --min_cov 4 --max_n_read 200 --n_core 2
overlap_filtering_setting = --max_diff 100 --max_cov 100 --min_cov 20 --bestn 10 --n_core 2
EOF
# macOS, i7-6700k, 32G RAM, SSD
( run in 2.390 seconds using v1.01-cache-2.11-cpan-f56aa216473 )