如何激活每个项目并解析其信息? [英] How can I activate each item and parse their information?

查看:82
本文介绍了如何激活每个项目并解析其信息?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在使用python抓取网页时遇到了另一种类型的问题.单击图像后,图像下方会出现有关其味道的新信息.我的目标是解析与每个图像相关的所有样式.我的脚本可以解析当前活动图像的样式,但是在单击新图像后会中断.在我的循环中抽动一下会引导我朝正确的方向前进.

I came across a different type of problem while scraping a webpage using python. When an image is clicked, new information concerning its' flavor comes up under the image. My goal is to parse all the flavors connected to each image. My script can parse the flavors of currently active image but breaks after clicking on the new image. A little twitch in my loop will lead me to the right direction.

我尝试过:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get("https://www.optigura.com/uk/product/gold-standard-100-whey/")
wait = WebDriverWait(driver, 10)

while True:
    items = wait.until(EC.presence_of_element_located((By.XPATH, "//div[@class='colright']//ul[@class='opt2']//label")))
    for item in items.find_elements_by_xpath("//div[@class='colright']//ul[@class='opt2']//label"):
        print(item.text)

    try:
        links = driver.find_elements_by_xpath("//span[@class='img']/img")
        for link in links:
            link.click()
    except:
        break

driver.quit() 

下面的图片可以阐明我无法做到的事情:

The picture underneath may clarify what i could not:

推荐答案

我调整了代码以正确单击链接,并检查当前列表项的文本是否与活动列表项的文本匹配.如果它们匹配,则可以安全地继续进行解析,而不必担心您一次又一次地解析相同的事物.在这里,您去了:

I tweaked the code to properly click on the links and to check if the current listed item's text matches with the active listed item's text. If they match, you can safely go on parsing without worrying that you are parsing the same thing over and over again. Here you go:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get("https://www.optigura.com/uk/product/gold-standard-100-whey/")
wait = WebDriverWait(driver, 10)
links = driver.find_elements_by_xpath("//span[@class='img']/img")

for idx, link in enumerate(links):
    while True:
        try:
            link.click()
            while driver.find_elements_by_xpath("//span[@class='size']")[idx].text != driver.find_elements_by_xpath("//div[@class='colright']//li[@class='active']//span")[1].text:
                link.click()
            print driver.find_elements_by_xpath("//span[@class='size']")[idx].text
            items = wait.until(EC.presence_of_element_located((By.XPATH, "//div[@class='colright']//ul[@class='opt2']//label")))
            for item in items.find_elements_by_xpath("//div[@class='colright']//ul[@class='opt2']//label"):
            print(item.text)
        except StaleElementReferenceException:
            continue
        break
driver.quit()

这篇关于如何激活每个项目并解析其信息?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆