尝试使用Python进行webscraping，审查 [英] An attempt to webscraping with Python, review

查看：116 发布时间：2019/6/7 19:32:09 Python webscraping scrape

本文介绍了尝试使用Python进行webscraping，审查的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在学习网络抓取，我将这段代码从ChemSpider中删除，但它很慢，我该怎样才能改进它？

i am learning web scraping and i made this code to scrape from ChemSpider but it is slow , how can i improve it?

from urllib.request import urlopen
from bs4 import BeautifulSoup as soup

search=input()
def scrape_search(search):
    my_url="http://www.chemspider.com/Search.aspx?q="+str(search)
    uClient=urlopen(my_url)
    page_html=uClient.read()
    uClient.close()
    page_soup=soup(page_html,"html.parser")
    target=page_soup.findAll("div",{"class":"results-wrapper table"})
    target=target[0]
    base_url="http://www.chemspider.com/Chemical-Structure."
    results=target.div.table.tbody.findAll("tr")
    scraped_data=[{"ID":None,"URL":None,"img_url":None,"Molecular Formula":None,"Molecular Weight":None,"Name":None} for i in range(0,len(results))]
    for i in range(0,len(results)):
        result=results[i].findAll("td")
        scraped_data[i]["ID"]=result[0].a.text.strip()
        scraped_data[i]["URL"]=base_url+str(scraped_data[i]["ID"])+".html"
        scraped_data[i]["img_url"]="http://www.chemspider.com/ImagesHandler.ashx?id="+str(scraped_data[i]["ID"])+"&w=250&h=250"
        scraped_data[i]["Molecular Formula"]=result[2].text.strip()
        names=result[2].findAll("<sub>")
        for name in names:
            scraped_data[i]["Molecular Formula"]+=str(name.sub).strip()
        scraped_data[i]["Molecular Weight"]=result[3].text.strip()
        scraped_data[i]["Name"]=scrape_id_page(base_url+str(scraped_data[i]["ID"])+".html")
    return scraped_data
def scrape_id_page(url):
    uClient=urlopen(url)
    page_html=uClient.read()
    uClient.close()
    page_soup=soup(page_html,"html.parser")
    target=page_soup.findAll("span",{"id":"ctl00_ctl00_ContentSection_ContentPlaceHolder1_RecordViewDetails_rptDetailsView_ctl00_WrapTitle"})
    return target[0].text.strip()
print(scrape_search(search))

< b>我尝试了什么：

如何改进我的代码？

比BeautifulSoup快得多？

What I have tried:

how can i improve my code?

is scrappy faster than BeautifulSoup?

尝试使用Python进行webscraping，审查 [英] An attempt to webscraping with Python, review

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

尝试使用Python进行webscraping，审查 [英] An attempt to webscraping with Python, review

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭