启动“我的刮板"后,我没有得到输出 [英] After starting My Scraper I do not get an output

查看:90
本文介绍了启动“我的刮板"后,我没有得到输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行一个刮板来检索产品名称,货号,尺寸和价格,但运行脚本时却没有输出或错误消息.我为此使用Jupyter Notebook,不确定是否是问题所在.我也不确定是否要这样做,因为如果这也给它带来了问题,我就将其插入到CSV文件中.任何帮助将不胜感激.

I am running a scraper to retrieve Product name, Cat No, Size and Price but when I run the script it doesn't give me an output or an error message. I am using Jupyter Notebook for this and not sure if that is the problem. I am also not sure if because I am imputing this into a CSV file if this is also giving it issues. Any help would be greatly appreciated.

这是我正在运行的代码.

This is the code that I am running.

from selenium import webdriver
import csv, os
from bs4 import BeautifulSoup

os.chdir(r'C:\Users\kevin.cragin\AppData\Local\pip\Cache\wheels\09\14\7d\1dcfcf0fa23dbb52fc459e5ce620000e7dca7aebd9300228fe') 
driver = webdriver.Chrome()
driver.get('https://www.biolegend.com/en-us/advanced-search?GroupID=&PageNum=1')
html = driver.page_source

containers = html.find_all('li', {'class': 'row list'})

with open("BioLegend_Crawl.csv", "w") as f:

    f.write("Product_name, CatNo, Size, Price\n")

    for container in containers:

        product_name = container.find('a',{'itemprop':'name'}).text
        info = container.find_all('div',{'class':'col-xs-2 noPadding'})
        catNo = info[0].text.strip()
        size = info[1].text.strip()
        price = info[2].text.strip()

        print('Product_name: '+ product_name)
        print('CatNo: ' + catNo)
        print('Size: ' + size)
        print('Price: ' + price + '\n')

        f.write(','.join([product_name,catNo,size,price]))

推荐答案

您正在使用的网站从技术上来讲是从数据库中加载信息,因此,在网站HTML中并未预设默认加载的产品名称.必须根据搜索约束动态加载它们.

Well the website you are using is technically loading information from a database, therefore it is not preset in the website HTML which product names are loaded by default. They must be loaded dynamically based on search constraints.

因此,您将需要下载chromedriver.exe(如果您使用Google Chrome)或其他可以自动执行网络浏览器的驱动程序(PhantomJS是另一个不错的驱动程序),那么您将需要指定计算机上的路径位置该.exe的存在方式如下:

So you will need to download chromedriver.exe (if you use Google Chrome) or some other driver that automates your web browser (PhantomJS is another good one), then you will need to specify the path location on your machine to where this .exe lives, like so:

import selenium import webdriver
import csv, os
from bs4 import BeautifulSoup

os.chdir('Path to chromedriver or other driver') 
driver = webdriver.Chrome()
driver.get('Link to your webpage you want to extract HTML from')
html = driver.page_source
soup = BeautifulSoup(html)

containers = soup.find_all('ul',{'id':'productsHolder'})

with open("BioLegend_Crawl.csv", "w") as f:

    f.write("Product_name, CatNo, Size, Price\n")

    for container in containers:

        product_name = container.find('a',{'itemprop':'name'}).text
        info = container.find_all('div',{'class':'col-xs-2 noPadding'})
        catNo = info[0].text.strip()
        size = info[1].text.strip()
        price = info[2].text.strip()

        print('Product_name: '+ product_name)
        print('CatNo: ' + catNo)
        print('Size: ' + size)
        print('Price: ' + price + '\n')

        f.write(','.join([product_name,catNo,size,price]))

这篇关于启动“我的刮板"后,我没有得到输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆