如何从在 Python 中使用 react.js 和 Selenium 的网页抓取数据? [英] How to scrape data from webpage which uses react.js with Selenium in Python?

查看：183 发布时间：2021/7/3 20:35:58 python reactjs selenium web-scraping webdriverwait

本文介绍了如何从在 Python 中使用 react.js 和 Selenium 的网页抓取数据?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在抓取使用 react.js 的网站时遇到了一些困难，但不知道为什么会这样.

I am facing some difficulties scraping a website which uses react.js and not sure why this is happening.

这是网站的html:

我想要做的是点击带有 类的按钮:play-pause-button btn btn -naked.但是，当我使用 Mozilla gecko webdriver 加载页面时，会抛出异常

What I wish to do is click on the button with the class: play-pause-button btn btn -naked. However, when I load the page with the Mozilla gecko webdriver there is an exception thrown saying

Message: Unable to locate element: .play-pause-button btn btn-naked

这让我觉得也许我应该做点其他事情来获得这个元素?到目前为止，这是我的代码:

which makes me think that maybe I should do something else to get this element? This is my code so far:

driver.get("https://drawittoknowit.com/course/neurological-system/anatomy/peripheral-nervous-system/1332/brachial-plexus---essentials")
    # execute script to scroll down the page
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
    time.sleep(10)        
    soup = BeautifulSoup(driver.page_source, 'lxml')
    print(driver.page_source)
    play_button = driver.find_element_by_class_name("play-pause-button btn btn-naked").click()
    print(play_button)

有没有人知道我该如何解决这个问题?非常感谢任何帮助

Does anyone have an idea as to how I could go about solving this? Any help is much appreciated

推荐答案

看来你已经很接近了.在使用find_element_by_class_name() 时，您不能传递多个类，并且只能传递一个类名，即只有以下之一:

Seems you were close. While using find_element_by_class_name() you can't pass multiple classes and you are allowed to pass only one classname, i.e. only only one among either of the following:

播放暂停按钮
btn
btn-naked

通过 find_element_by_class_name() 传递多个类时，您将面临消息:无效选择器:不允许使用复合类名

On passing multiple classes through find_element_by_class_name() you will face Message: invalid selector: Compound class names not permitted

作为替代，因为元素是一个 Angular 元素，用于 click() 在元素上，您必须为 element_to_be_clickable() 引入 WebDriverWait 并且您可以使用以下任一定位器策略:

As an alternative, as the element is an Angular element, to click() on the element you have to induce WebDriverWait for the element_to_be_clickable() and you you can use either of the following Locator Strategies:

使用CSS_SELECTOR:

WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button.play-pause-button.btn.btn-naked")))click()

使用XPATH:

WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[@class='play-pause-button btn btn-naked']")))click()

注意:您必须添加以下导入:

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

这篇关于如何从在 Python 中使用 react.js 和 Selenium 的网页抓取数据?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何从在 Python 中使用 react.js 和 Selenium 的网页抓取数据? [英] How to scrape data from webpage which uses react.js with Selenium in Python?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何从在 Python 中使用 react.js 和 Selenium 的网页抓取数据? [英] How to scrape data from webpage which uses react.js with Selenium in Python?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭