在 Python 中使用 Selenium 抓取评论时遇到问题 [英] Trouble scraping reviews using Selenium in Python

查看:27
本文介绍了在 Python 中使用 Selenium 抓取评论时遇到问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从这个网站上抓取评论:https://www.sephora.com/product/the-porefessional-face-primer-P264900.这是我在检查评论时发现的语法示例:

I would like to scrape the reviews from this website: https://www.sephora.com/product/the-porefessional-face-primer-P264900. Here is an example of the syntax I find when I inspect a review:

<div class="css-7rv8g1 " data-comp="Ellipsis Box ">So good! This primer smooths my skin and blurs my pores so well! But, it is pretty mattifying so if you want a dewy look, this might not be for you.</div>

我尝试了以下代码,它返回一个空列表:

I have tried the following code, which returns an empty list:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome('/…/chromedriver')
url = 'https://www.sephora.com/product/the-porefessional-face-primer-P264900'
driver.get(url)
reviews = driver.find_elements_by_xpath("//div[@id='ratings-reviews']//div[@data-comp='Ellipsis Box']")

我尝试在 driver 上调用其他 find_elements 方法但没有成功.我还尝试了 this answer 中概述的解决方案,但运行以下代码时得到了 TimeoutException:

I have tried calling other find_elements methods on driver without success. I have also tried the solution outlined at this answer, but got a TimeoutException from running the following code:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver.get(url)
driver.execute_script("arguments[0].scrollIntoView(true);", WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.XPATH, "//div[@id='tabpanel0']/div//b[contains(., 'What Else You Need to Know')]"))))
reviews = WebDriverWait(driver,20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@data-comp='GridCell Box']//div[@data-comp='Ellipsis Box']")))

如何使用 Selenium 从 Sephora 网站上的此页面抓取评论?

How can I use Selenium to scrape reviews from this page on Sephora’s website?

推荐答案

您需要在 xpath 中添加一个空格.当它应该是省略号框"时,你有省略号框"

You need to add a space in your xpath. You have 'Ellipsis Box' when it should be 'Ellipsis Box '

//div[@id='ratings-reviews']//div[@data-comp='Ellipsis Box ']

我能够使用更正的 xpath 找到 6 个元素.

I was able to find 6 elements using the corrected xpath.

这篇关于在 Python 中使用 Selenium 抓取评论时遇到问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆