使用biopython搜索pubmed [英] Searching on pubmed using biopython

查看:201
本文介绍了使用biopython搜索pubmed的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将200多个条目输入到publish中,以记录作者发表的文章数量,并通过包括其导师和机构来完善搜索范围.我尝试使用biopython和xlrd(下面的代码)执行此操作,但是对于三种查询格式(1.按名称,2.按名称和机构名称,3.按名称和导师的姓名).是否可以执行故障排除步骤,或者在使用以下指示的关键字搜索pubmed时使用其他格式?

I am trying to input over 200 entries into pubmed in order to record the number of articles published by an author and to refine the search by including his/her mentor and institution. I have tried to do this using biopython and xlrd (the code is below), but I am consistently getting 0 results for all three formats of inquiries (1. by name, 2. by name and institution name, and 3. by name and mentor's name). Are there steps of troubleshooting that I can do, or should I use a different format when using the keywords indicated below to search on pubmed?

输入查询的示例输出; search_term是包含输入查询列表的链接列表.

Example output of the input queries;search_term is a linked list with lists of the input queries.

print(*search_term[8:15], sep='\n')


[text:'Andrew Bland', 'Weill Cornell Medical College', text:'David Cutler MD']
[text:'Andy Price', 'University of Alabama at Birmingham School of Medicine', text:'Jason Warem, PhD']
[text:'Bah Chamin', 'University of Texas Southwestern Medical School', text:'Dr. Timothy Hillar']
[text:'Eduo Cera', 'University of Colorado School of Medicine', text:'Dr. Tim']

用于生成上述输入查询并在Pubmed上进行搜索的代码:

Code used to generate the input queries above and to search on Pubmed:

Entrez.email = "mollyzhaoe@college.harvard.edu"
for search_term in search_terms[8:55]:
    handle = Entrez.egquery(term="{0} AND ((2010[Date - Publication] : 2017[Date - Publication])) ".format(search_term[0]))

    handle_1 = Entrez.egquery(term = "{0} AND ((2010[Date - Publication] : 2017[Date - Publication])) AND {1}".format(search_term[0], search_term[2]))

    handle_2 = Entrez.egquery(term = "{0} AND ((2010[Date - Publication] : 2017[Date - Publication])) AND {1}".format(search_term[0], search_term[1]))

    record = Entrez.read(handle)
    record_1 = Entrez.read(handle_1)
    record_2 = Entrez.read(handle_2)
    pubmed_count = ['','','']
    for row in record["eGQueryResult"]:
        if row["DbName"] == "pubmed":
            pubmed_count[0] = row["Count"]

    for row in record_1["eGQueryResult"]:
        if row["DbName"] == "pubmed":
            pubmed_count[1] = row["Count"]

    for row in record_2["eGQueryResult"]:
        if row["DbName"] == "pubmed":
            pubmed_count[2] = row["Count"]

推荐答案

检查您的缩进,很难知道哪个部分属于哪个循环.

如果要进行故障排除,请尝试打印egquery,例如

If you want to troubleshoot, try printing your egquery, e.g.

print("{0} AND ((2010[Date - Publication] : 2017[Date - Publication])) ".format(search_term[0]))

,然后将输出粘贴到pubmed中,然后看看您得到了什么.也许稍加修改,看看哪个搜索词会引起问题.

and paste the output to pubmed and see what you get. Perhaps modify it a bit and see which search term causes the problems.

您的输入格式有点难以猜测.打印查询并确保获得正确的搜索值.

Your input format is a little bit hard to guess. Print the query and make sure you are getting the right search values.

对于作者姓名,请尝试删除学术标题,PubMed可能会将其与姓名缩写混淆,例如House MD,可能是Mark David House.

For the author names, try to get rid of the academic titles, PubMed might confused them with the initials, e.g. House MD, might be Mark David House.

这篇关于使用biopython搜索pubmed的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆