在Biopython中使用搜索字词返回登录号 [英] Using search terms with Biopython to return accession numbers

查看:84
本文介绍了在Biopython中使用搜索字词返回登录号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将Biopython(Entrez)与搜索项一起使用,这些搜索项将返回登录号(而不是GI *).

I am trying to use Biopython (Entrez) with search terms that will return the accession number (and not the GI*).

这是我的代码的一小部分摘录:

Here is a tiny excerpt of my code:

from Bio import Entrez

Entrez.email = 'myemailaddress'
search_phrase = 'Escherichia coli[organism]) AND (complete genome[keyword])'
handle = Entrez.esearch(db='nuccore', term=search_phrase, retmax=100, rettype='acc', retmode='text')
result = Entrez.read(handle)
handle.close()
gi_numbers = result['IdList']
print(gi_numbers)

'745369752','910228862','187736741','802098270','802098269', '802098267','387610477','544579032','544574430','215485161', '749295052','387823261','387605479','641687520','641682562', '594009615','557270520','313848522','309700213','284919779', '215263233','544345556','544340954','144661','51773702', '202957457','202957451','172051323'

'745369752', '910228862', '187736741', '802098270', '802098269', '802098267', '387610477', '544579032', '544574430', '215485161', '749295052', '387823261', '387605479', '641687520', '641682562', '594009615', '557270520', '313848522', '309700213', '284919779', '215263233', '544345556', '544340954', '144661', '51773702', '202957457', '202957451', '172051323'

我确定我可以从GI转换为加入,但是避免附加步骤会很不错.我想念什么魔术?

I am sure I can convert from GI to accession, but it would be nice to avoid the additional step. What slice of magic am I missing?

谢谢.

*尤其是因为NCBI正在逐步淘汰GI号

*especially since NCBI is phasing out GI numbers

推荐答案

浏览 NCBI网站上的esearch 文档,只有两个rettype可用-uilist,这是您当前获得的默认XML格式(由Entrez.read())和count,它们仅显示Count值(查看result的完整内容,就在其中),我不清楚其确切含义,因为它不代表IdList ...

Looking through the docs for esearch on NCBI's website, there are only two rettypes available - uilist, which is the default XML format that you're getting currently (it's parsed into a dict by Entrez.read()), and count, which just displays the Count value (look at the complete contents of result, it's there), which I'm unclear on its exact meaning, as it doesn't represent the total number of items in IdList...

无论如何,Entrez.esearch()将采用您喜欢的任何rettyperetmode值,但是它仅以xmljson模式返回uilistcount-没有登录ID ,没什么.

At any rate, Entrez.esearch() will take any value of rettype and retmode you like, but it only returns the uilist or count in xml or json mode - no accession IDs, no nothin'.

Entrez.efetch() 会将您带回各种有趣的东西,取决于您要查询的数据库.当然,缺点是您需要通过一个或多个ID进行查询,而不是通过搜索字符串进行查询,因此,要获取您的登录ID,您需要运行两个查询:

Entrez.efetch() will pass you back all sorts of cool stuff, depending on which DB you're querying. The downside, of course, is that you need to query by one or more IDs, not by a search string, so in order to get your accession IDs you'd need to run two queries:

search_phrase = "Escherichia coli[organism]) AND (complete genome[keyword])"
handle = Entrez.esearch(db="nuccore", term=search_phrase, retmax=100)
result = Entrez.read(handle)
handle.close()
fetch_handle = Entrez.efetch(db="nuccore", id=results["IdList"], rettype="acc", retmode="text")
acc_ids = [id.strip() for id in fetch_handle]
fetch_handle.close()
print(acc_ids)

给予

['HF572917.2','NZ_HF572917.1','NC_010558.1','NZ_HG941720.1','NZ_HG941719.1','NZ_HG941718.1','NC_017633.1','NC_022371.1 ','NC_022370.1','NC_011601.1','NZ_HG738867.1','NC_012892.2','NC_017626.1','HG941719.1','HG941718.1','HG941720.1', 'HG738867.1','AM946981.2','FN649414.1','FN554766.1','FM180568.1','HG428756.1','HG428755.1','M37402.1','AJ304858 .2','FM206294.1','FM206293.1','AM886293.1']

['HF572917.2', 'NZ_HF572917.1', 'NC_010558.1', 'NZ_HG941720.1', 'NZ_HG941719.1', 'NZ_HG941718.1', 'NC_017633.1', 'NC_022371.1', 'NC_022370.1', 'NC_011601.1', 'NZ_HG738867.1', 'NC_012892.2', 'NC_017626.1', 'HG941719.1', 'HG941718.1', 'HG941720.1', 'HG738867.1', 'AM946981.2', 'FN649414.1', 'FN554766.1', 'FM180568.1', 'HG428756.1', 'HG428755.1', 'M37402.1', 'AJ304858.2', 'FM206294.1', 'FM206293.1', 'AM886293.1']

因此,我不确定我是否能令人满意地回答您的问题,但不幸的是,我认为答案是没有魔术."

So, I'm not terribly sure if I answered your question satisfactorily, but unfortunately I think the answer is "There is no magic."

这篇关于在Biopython中使用搜索字词返回登录号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆