尝试使用Python进行webscraping,审查 [英] An attempt to webscraping with Python, review

查看:116
本文介绍了尝试使用Python进行webscraping,审查的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在学习网络抓取,我将这段代码从ChemSpider中删除,但它很慢,我该怎样才能改进它?



i am learning web scraping and i made this code to scrape from ChemSpider but it is slow , how can i improve it?

from urllib.request import urlopen
from bs4 import BeautifulSoup as soup

search=input()
def scrape_search(search):
    my_url="http://www.chemspider.com/Search.aspx?q="+str(search)
    uClient=urlopen(my_url)
    page_html=uClient.read()
    uClient.close()
    page_soup=soup(page_html,"html.parser")
    target=page_soup.findAll("div",{"class":"results-wrapper table"})
    target=target[0]
    base_url="http://www.chemspider.com/Chemical-Structure."
    results=target.div.table.tbody.findAll("tr")
    scraped_data=[{"ID":None,"URL":None,"img_url":None,"Molecular Formula":None,"Molecular Weight":None,"Name":None} for i in range(0,len(results))]
    for i in range(0,len(results)):
        result=results[i].findAll("td")
        scraped_data[i]["ID"]=result[0].a.text.strip()
        scraped_data[i]["URL"]=base_url+str(scraped_data[i]["ID"])+".html"
        scraped_data[i]["img_url"]="http://www.chemspider.com/ImagesHandler.ashx?id="+str(scraped_data[i]["ID"])+"&w=250&h=250"
        scraped_data[i]["Molecular Formula"]=result[2].text.strip()
        names=result[2].findAll("<sub>")
        for name in names:
            scraped_data[i]["Molecular Formula"]+=str(name.sub).strip()
        scraped_data[i]["Molecular Weight"]=result[3].text.strip()
        scraped_data[i]["Name"]=scrape_id_page(base_url+str(scraped_data[i]["ID"])+".html")
    return scraped_data
def scrape_id_page(url):
    uClient=urlopen(url)
    page_html=uClient.read()
    uClient.close()
    page_soup=soup(page_html,"html.parser")
    target=page_soup.findAll("span",{"id":"ctl00_ctl00_ContentSection_ContentPlaceHolder1_RecordViewDetails_rptDetailsView_ctl00_WrapTitle"})
    return target[0].text.strip()
print(scrape_search(search))





< b>我尝试了什么:



如何改进我的代码?



比BeautifulSoup快得多?



What I have tried:

how can i improve my code?

is scrappy faster than BeautifulSoup?

推荐答案

引用:

我怎样才能改善我的代码?

how can i improve my code?



你需要了解代码是如何花时间的。

该工具是一个程序分析器。

26.4。 Python Profilers - Python 2.7.15文档 [ ^ ]

python - 你如何描述一个脚本? - Stack Overflow [ ^ ]

像老板一样分析Python - Zapier工程博客| Zapier [ ^ ]


You need to understand how your code is spending time.
The tool is a program profiler.
26.4. The Python Profilers — Python 2.7.15 documentation[^]
python - How can you profile a script? - Stack Overflow[^]
Profiling Python Like a Boss - The Zapier Engineering Blog | Zapier[^]


这篇关于尝试使用Python进行webscraping,审查的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆