BLASTN适用于极短的序列搜索的参数

七月 21, 2014

今天遇到一个小问题:需要搜索若干极短的DNA序列(不到20bp)的来源,首先想到的是BLAST,但是直接用默认参数肯定不行了。后来在某个大学的网站上找到了调整BLAST参数以适应极短的序列搜索的方法:

BLAST Parameters for short query sequences

For searching sequence similarities within very short fragments, BLAST may not be the best choice. If you want to tackle this anyhow, the word size should be reduced to the minimum, and the expectation value should be adjusted as well. Minimal settings for word size are -W 7 for blastn, and -W 2 for blastx in conjunction with reducing the neighborhood word threshold score to -f 8 or below (this is only necessary for blastx). Expectation value should be -E 100. Yes, that's no joke. When comparing against large databases like NT or NR, such high amounts of expected random hits have to be accepted. A lower eValue threshold could be used when only nearly exact matches are desired.

其实主要就是把-W 设成7.尝试过,1条8bp长的序列,就算是严格地在库中存在,如果只用默认参数,啥也搜不到。如果加上-W 7,就可以搜到。

但是第一句话已经给出了警告:对于这样的搜索,其实BLAST并不是最好的选择。究竟有什么更合适的呢?

posted in Biology Science by billzt

Follow comments via the RSS Feed | Leave a comment | Trackback URL

说点什么

您将是第一位评论人!

提醒
 

Copyright © 2010-2017 | Powered by Wordpress and MySQL. Theme by Shlomi Noach, openark.org