刮板周期性地单击同一链接? [英] Scraper clicking on the same link cyclically?

查看:173
本文介绍了刮板周期性地单击同一链接?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经用python在python中编写了一些脚本,以从redmart网站上抓取不同产品的名称和价格.我的目标是单击主页上方10个中的每个类别,然后解析所有进入目标页面的产品.但是,单击类别后,浏览器将在新打开的页面上,因此,此时必须再次进入主页以单击10个类别链接中的另一个.我的抓取工具单击一个链接,转到其目标页面,从那里分析数据,返回首页,然后单击相同的链接,然后一遍又一遍地做其余的事情.这是我正在尝试的脚本:

I've written some script in python using selenium to scrape name and price of different products from redmart website. My target is to click on each category among 10 in the upper side of the main page and parse all the products going to the target page. However, when a category is clicked, the browser is on newly opened page so at this point it is necessary to get to the main page again to click another one among 10 category links. My scraper clicks on a link, goes to its target page, parses data from there, gets back to the main page and clicks on the same link and does the rest over and over again. Here is the script I'm trying with:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get("https://redmart.com/bakery")
wait = WebDriverWait(driver, 10)

while True:
    try:
        wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "li.image-facets-pill")))
        driver.find_element_by_css_selector('img.image-facets-pill-image').click()          
    except:
        break

    for elems in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "li.productPreview"))):
        name = elems.find_element_by_css_selector('h4[title] a').text
        price = elems.find_element_by_css_selector('span[class^="ProductPrice__"]').text
        print(name, price)

    driver.back()

driver.quit()   

顺便说一句,我认为有必要调整此脚本中的"try"和"except"块以获得所需的输出.

Btw, I think it is necessary to tune up the "try" and "except" block in this script to get the desired output.

推荐答案

您可以实现简单的计数器,该计数器可让您遍历以下类别的列表:

You can implement simple counter that will allow you to iterate through list of categories as below:

counter = 0

while True:

    try:
        wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "li.image-facets-pill")))
        driver.find_elements_by_css_selector('img.image-facets-pill-image')[counter].click()      
        counter += 1    
    except IndexError:
        break  

    for elems in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "li.productPreview"))):
        name = elems.find_element_by_css_selector('h4[title] a').text
        price = elems.find_element_by_css_selector('span[class^="ProductPrice__"]').text
        print(name, price)

    driver.back()

driver.quit() 

这篇关于刮板周期性地单击同一链接?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆