使用 Python 要求网页运行搜索 [英] Using Python to ask a web page to run a search

查看:38
本文介绍了使用 Python 要求网页运行搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Uniprot"格式的蛋白质名称列表,我想将它们全部转换为 MGI 格式.如果您访问 www.uniprot.org 并在查询"栏中键入 uniprot 蛋白质名称,它将生成一个页面,其中包含有关该蛋白质的一堆信息,包括其 MGI 名称(尽管在页面下方).

I have a list of protein names in the "Uniprot" format, and I'd like to convert them all to the MGI format. If you go to www.uniprot.org and type the uniprot protein name into the "Query" bar, it will generate a page with a bunch of information about that protein, including its MGI name (albeit much further down the page).

例如一个 Uniprot 名称是Q9D880",向下滚动可以看到其对应的 MGI 名称是1913775".

For example, one Uniprot name is "Q9D880", and by scrolling down, you can see that its corresponding MGI name is "1913775".

我已经知道如何使用 Python 的 urllib 在我到达该页面后从该页面中提取 MGI 名称.我知道如何编写 Python 代码来让主页运行Q9D880"查询.我的列表包含 270 个蛋白质名称,因此最好避免将每个蛋白质名称复制并粘贴到查询栏中.

I already know how to use Python's urllib to extract the MGI name from a page once I get to that page. What I don't know how to do is write Python code to get the main page to run a query of "Q9D880". My list contains 270 protein names, so it would be nice to avoid copying&pasting each protein name into the Query bar.

我看到了Google Search from a Python App"的帖子,对这个概念有了更深刻的理解,但我怀疑运行 google 搜索与在其他一些网站上运行搜索功能不同,比如 uniprot.org.

I saw the "Google Search from a Python App" post, and I have a firmer understanding of this concept, but I suspect that running a google search is different from running the search function on some other website, like uniprot.org.

我正在运行 Python 2.7.2,但我愿意实施使用其他版本 Python 的解决方案.感谢您的帮助!

I'm running Python 2.7.2, but I'm open to implementing solutions that use other versions of Python. Thanks for the help!

推荐答案

更简单的方法是使用 requests 库.我的解决方案还使用 BeautifulSoup4 从页面中获取信息本身.

Easier way to do this is with the requests library. My solution for you also grabs the information itself from the page using BeautifulSoup4.

所有你必须do,给定查询参数的字典,是:

All you'd have to do, given a dictionary of your query parameters, is:

from bs4 import BeautifulSoup as BS
for protein in my_protein_list:
    text = requests.get('http://www.uniprot.org/uniprot/' + protein).text
    soup = BS(text)
    MGI = soup.find(name='a', onclick="UniProt.analytics('DR-lines', 'click', 'DR-MGI');").text
    MGI = MGI[4:]
    print protein +' - ' + MGI

这篇关于使用 Python 要求网页运行搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆