尝试使用Python进行webscraping,审查 [英] An attempt to webscraping with Python, review
问题描述
我正在学习网络抓取,我将这段代码从ChemSpider中删除,但它很慢,我该怎样才能改进它?
i am learning web scraping and i made this code to scrape from ChemSpider but it is slow , how can i improve it?
from urllib.request import urlopen
from bs4 import BeautifulSoup as soup
search=input()
def scrape_search(search):
my_url="http://www.chemspider.com/Search.aspx?q="+str(search)
uClient=urlopen(my_url)
page_html=uClient.read()
uClient.close()
page_soup=soup(page_html,"html.parser")
target=page_soup.findAll("div",{"class":"results-wrapper table"})
target=target[0]
base_url="http://www.chemspider.com/Chemical-Structure."
results=target.div.table.tbody.findAll("tr")
scraped_data=[{"ID":None,"URL":None,"img_url":None,"Molecular Formula":None,"Molecular Weight":None,"Name":None} for i in range(0,len(results))]
for i in range(0,len(results)):
result=results[i].findAll("td")
scraped_data[i]["ID"]=result[0].a.text.strip()
scraped_data[i]["URL"]=base_url+str(scraped_data[i]["ID"])+".html"
scraped_data[i]["img_url"]="http://www.chemspider.com/ImagesHandler.ashx?id="+str(scraped_data[i]["ID"])+"&w=250&h=250"
scraped_data[i]["Molecular Formula"]=result[2].text.strip()
names=result[2].findAll("<sub>")
for name in names:
scraped_data[i]["Molecular Formula"]+=str(name.sub).strip()
scraped_data[i]["Molecular Weight"]=result[3].text.strip()
scraped_data[i]["Name"]=scrape_id_page(base_url+str(scraped_data[i]["ID"])+".html")
return scraped_data
def scrape_id_page(url):
uClient=urlopen(url)
page_html=uClient.read()
uClient.close()
page_soup=soup(page_html,"html.parser")
target=page_soup.findAll("span",{"id":"ctl00_ctl00_ContentSection_ContentPlaceHolder1_RecordViewDetails_rptDetailsView_ctl00_WrapTitle"})
return target[0].text.strip()
print(scrape_search(search))
< b>我尝试了什么:
如何改进我的代码?
比BeautifulSoup快得多?
What I have tried:
how can i improve my code?
is scrappy faster than BeautifulSoup?
推荐答案
我怎样才能改善我的代码?
how can i improve my code?
你需要了解代码是如何花时间的。
该工具是一个程序分析器。
26.4。 Python Profilers - Python 2.7.15文档 [ ^ ]
python - 你如何描述一个脚本? - Stack Overflow [ ^ ]
像老板一样分析Python - Zapier工程博客| Zapier [ ^ ]
You need to understand how your code is spending time.
The tool is a program profiler.
26.4. The Python Profilers — Python 2.7.15 documentation[^]
python - How can you profile a script? - Stack Overflow[^]
Profiling Python Like a Boss - The Zapier Engineering Blog | Zapier[^]
这篇关于尝试使用Python进行webscraping,审查的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!