如何从动态更新的网页中提取数据 [英] How do I extract data from dynamic updating webpages

查看:43
本文介绍了如何从动态更新的网页中提取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从丝芙兰网站上抓取评论.评论动态更新.

I want to scrape the review from Sephora website. The review is dynamically updated.

经过检查,我发现评论在 HTML 代码中.

After inspection I found the review is here in the HTML code.

<div class="css-eq4i08 " data-comp="Ellipsis Box">Honestly I never write 
reviews but this is a must if you have frizzy after even after straightening 
it! It smells fantastic and it works wonders definitely will be restocking once 
I’m done this one !!</div>

我想写一个python selenium代码来阅读评论.

I want to write a python selenium code to read the review.

我写的代码在这里...

The code I wrote is here...

from selenium import webdriver
chrome_path = (r"C:/Users/Connectm/Downloads/chromedriver.exe")

driver = webdriver.Chrome(chrome_path)
driver.implicitly_wait(20) 
driver.get("https://www.sephora.com/product/crybaby-coconut-oil-shine-serum-P439093?skuId=2122083&icid2=just%20arrived:p439093")
reviews = driver.find_element_by_xpath('//*[@id="ratings-reviews"]/div[4]/div[2]/div[2]/div[1]/div[3][@data-comp()='Elipsis Box'])
print(reviews.text)

如果我写 find_element_by_class 它给我空白.

If I write find_element_by_class it gives me blank.

什么是最好的选择?

我正在尝试将 xpath 与属性一起使用.该代码不起作用.有人请帮助我最好的解决方案是什么?

I am trying to use xpath with attribute. The code is not working. Someone please help me on what is the best solution?

推荐答案

要从 Sephora 网站抓取评论,您必须引入 WebDriverWait 以使元素可见,您可以使用以下解决方案:

To scrape the reviews from Sephora website you have to induce WebDriverWait for the elements to be visible and you can use the following solution:

  • 代码块:

  • Code Block:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument("disable-infobars")
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get("https://www.sephora.com/product/crybaby-coconut-oil-shine-serum-P439093?skuId=2122083&icid2=just%20arrived:p439093")
driver.execute_script("arguments[0].scrollIntoView(true);", WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.XPATH, "//div[@id='tabpanel0']/div//b[contains(., 'What Else You Need to Know')]"))))
reviews = WebDriverWait(driver,20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@data-comp='GridCell Box']//div[@data-comp='Ellipsis Box']")))
for review in reviews:
    print(review.get_attribute("innerHTML"))

  • 控制台输出:

  • Console Output:

    Honestly I never write reviews but this is a must if you have frizzy after even after straightening it! It smells fantastic and it works wonders definitely will be restocking once I’m done this one !!
    I really like this product. I was looking for something to tame frizz and fly aways during the winter and this does the job. At first I was nervous it might give a greasy look but it makes my hair smooth and soft. Scent is actually a little subtle for me, but still nice.
    This oil-serum is perfect for the right level of hydration without the feel of oil residue. Great for all hair types and my new go-to product.
    I LOVE how weightless this oil feels in my hair.. takes away all of my flyaways without looking of feeling greasy.. the packaging is COOL (travel-friendly) and it smells wonderful!!
    I tried this when it first dropped on their website. I’ve been using it for about 3 weeks now. And I have to say its just OKAY. Nothing super special about it. I haven’t noticed super smooth hair that isn’t given with other products that cost less. It’s just like any other smoothing serum. I also can’t figure out what the smell is. It doesn’t really smell as pleasant as their other products.
    in love!! A tiny bit goes a long way. No more fly aways. No more frizz from touch or environment.
    

  • 这篇关于如何从动态更新的网页中提取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆