App-Anchr
view release on metacpan or search on metacpan
doc/masurca.md view on Meta::CPAN
- [çæ¬](#çæ¬)
- [ä¾èµ](#ä¾èµ)
- [å®è£
](#å®è£
)
- [æ ·ä¾æ°æ®](#æ ·ä¾æ°æ®)
- [Rhodobacter sphaeroides (ç形红ç»è)](#rhodobacter-sphaeroides-ç形红ç»è)
- [Illumina PE, Short Jump and Sanger (1x or 4x)](#illumina-pe-short-jump-and-sanger-1x-or-4x)
- [Rhodobacter sphaeroides with `anchr superreads`](#rhodobacter-sphaeroides-with-anchr-superreads)
- [ç»ææ¯è¾](#ç»ææ¯è¾)
- [SuperReads 3.1.3](#superreads-313)
- [Super-reads and anchors](#super-reads-and-anchors)
- [E. coli sampling](#e-coli-sampling)
- [E. coli: link anchors](#e-coli-link-anchors)
# ç¹ç¹
De novo åºå ç»åºåçæ¼æ¥æä»¥ä¸å ç§ä¸»æµççç¥:
1. Overlapâlayoutâconsensus (OLC) assembly
* 主è¦ç¨äºé¿ reads, å¨ Sanger æµåºæ¶ä»£å°±åºæ¬åå±å®å¤, ä¸ä»£æ¶ä»£åéæ°åå±
* 代表: Celera Assembler, PCAP, Canu
2. de Bruijn graph (å¾·å¸é²å å¾)
* äºä»£æµåºç主æµ
* 代表: Velvet, SOAPdenovo, Allpaths-LG
3. String graph
* Myers æåºç OLC çæ©å±, ä¸»è¦æ¯ layout é¨åçç®æ³ææä¸å. SGA ä½¿ç¨ FM-index/Burrows-Wheeler transform
æ¥æ¾ overlaps, è¾ä¸ºèçå
å
* 代表: SGA
MaSuRCA æåºäºä¸ç§æ°ççç¥, Super-reads. ä¸»è¦ææ³æ¯å°å¤ä¸ªç reads æ 1 bp (å®é
䏿¯ unique K-mer) 延伸,
åå¹¶å¾å°æ°éå°å¾å¤çé¿ reads. å¨ååä½åºå ç»çæ
åµä¸, æ 论è¦ç度æ¯å¤å° (50x, 100x), æç»ç super-reads
è¦ç度é½è¶åäº 2x. 髿ååºå ç»åè¶åäº 4x.
åå¹¶åç super-reads ç N50 约为 2-4 kbp.
# çæ¬
version 3.1.3.
homebrew-science éççæ¬æ¯ 2.3.2b, 3.1.3 ç
[PR](https://github.com/Homebrew/homebrew-science/pull/3802) 乿äº, 使²¡åå¹¶.
乿 UMD ç ftp ä¸æäº 3.2.1 ç, å¤äº CA8, MUMmer å PacBio ä¸ä¸ªç®å½, è¿æ«è¯¦ç»ç ç©¶.
http://ccb.jhu.edu/software.shtml
> New modules coming soon include methods to create hybrid assemblies using both Illumina and PacBio
> data.
# ä¾èµ
å¤é¨
* gcc-4: macOS ä¸ç clang æ æ³ç¼è¯
* m4: å®è¯è¨, ç± `autoreconf -fi` çæ, æ¯ `GNU autotools` çä¸é¨å, ä¸ç¨ç®¡
* swig: for Perl binding of jellyfish
èªå¸¦
* Celera Assembler
* [jellyfish](https://github.com/gmarcais/Jellyfish): k-mer counting
* prepare: æ ææ¡£, çèµ·æ¥æ¯é¢å¤çæ°æ®ç¨ç.
* [Quorum](https://github.com/gmarcais/Quorum): Error correction for Illumina reads.
* samtools
* SOAPdenovo2
* SuperReads: masurca ç主ç¨åº. è¿ä¸ªæ¯æä»¬æéè¦ç, åå¹¶ reads çåè½å°±å¨è¿é. æºç 约äºä¸è¡.
* ufasta: UMD çæä½ fasta çå·¥å
·, æªå¨å
¶å®å°æ¹åç°ç¸å
³ä¿¡æ¯. éé¢ç tests åå¾ä¸é, å¼å¾åé´.
# å®è£
```bash
echo "==> MaSuRCA"
cd /prepare/resource/
wget -N ftp://ftp.genome.umd.edu/pub/MaSuRCA/MaSuRCA-3.1.3.tar.gz
if [ -d $HOME/share/MaSuRCA ]; then
rm -fr $HOME/share/MaSuRCA
fi
cd $HOME/share/
tar xvfz /prepare/resource/MaSuRCA-3.1.3.tar.gz
mv MaSuRCA-* MaSuRCA
cd MaSuRCA
sh install.sh
```
ç¼è¯å®æå, ä¼çæ `bin` ç®å½, é颿¯å¯æ§è¡æä»¶, `tree bin`.
```text
bin
âââ add_missing_mates.pl
âââ addSurrogatesToFrgCtgFile
âââ addSurrogatesToFrgctg.perl
âââ bloom_query
âââ closeGapsInScaffFastaFile.perl
âââ closeGapsLocally.perl
âââ closeGaps.oneDirectory.fromMinKmerLen.perl
âââ closeGaps.oneDirectory.perl
âââ closeGaps.perl
âââ close_gaps.sh
âââ collectReadSequencesForLocalGapClosing
âââ compute_sr_cov.pl
âââ compute_sr_cov.revisedForGCContig.pl
âââ create_end_pairs.perl
âââ create_end_pairs.pl
âââ createFastaSuperReadSequences
âââ createKUnitigMaxOverlaps
âââ create_k_unitigs_large_k
âââ create_k_unitigs_large_k2
âââ create_sr_frg
âââ create_sr_frg.pl
âââ createSuperReadSequenceAndPlacementFileFromCombined.perl
âââ createSuperReadsForDirectory.perl
âââ eliminateBadSuperReadsUsingList
âââ error_corrected2frg
( run in 1.159 second using v1.01-cache-2.11-cpan-2398b32b56e )