Python - 从 Selenium 中的 ::before 伪元素上的 CSS 属性“content"获取文本? [英] Python - get text from CSS property “content” on a ::before pseudo element in Selenium?

查看:51
本文介绍了Python - 从 Selenium 中的 ::before 伪元素上的 CSS 属性“content"获取文本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试抓取一些元素并返回网页上显示的文本.我相信我可以通过 css_selectors 和 xpaths 找到很好的元素,但我无法返回所需的文本.这是我的程序如下:

I am trying to scrape an a few elements and return the displayed text on the webpage. I believe I can find the elements fine through css_selectors and xpaths, but i cannot return the desired text. Here is my program below:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait as wait
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
import time
import threading
import pandas as pd

threadLocal = threading.local()

def instantiate_chrome():
    driver = getattr(threadLocal, 'driver', None)

    if driver is None:
        options = webdriver.ChromeOptions()
        options.add_argument('log-level=3')
        options.add_argument('--ignore-certificate-errors')
        options.add_argument('--ignore-ssl-errors')
        driver = webdriver.Chrome(executable_path = r'path/to/chrome', options = options)
        setattr(threadLocal, 'driver', driver)

    return driver

def search_stock(driver, stock):
    search_url = r'https://www.forbes.com/search/?q=' + stock
    driver.get(search_url)
    time.sleep(2)
    driver.find_element_by_xpath(r'/html/body/div[1]/main/div[1]/div[1]/div[4]/div/div[1]/div/div[1]/a[1]').click()

def get_q_score(stock, driver):

    df = pd.DataFrame(columns = ['stock','overall_score','quality', 'momentum','growth','technicals'])
    time.sleep(3)
    overall_score = driver.find_element_by_css_selector(r'.q-factor-total .q-score-bar__grade-label').text
    quality_score = driver.find_element_by_xpath(r'/html/body/div[1]/main/div/div[1]/div[4]/div[2]/div[2]/div[1]/div[2]/div[1]').text

    return print('overall score is '+ overall_score, ' quality score is ' + quality_score)

def main(stock):
    driver = instantiate_chrome()
    print('attempting to get q score for ' + stock)
    search_stock(driver, stock)
    print('found webpage for ' + stock)
    get_q_score(stock, driver)

main('AAPL')

我认为问题在于我试图通过 selenium 的 .text 方法抓取文本,但没有要抓取的文本.有什么想法吗?

I believe the issue to be that i am attempting to scrape the text via selenium's .text method, but there is no text to scrape. Any thoughts?

推荐答案

除了您提到的文本实际上不是 text 之外,您走在正确的道路上.这些 texts 实际上是由一个名为 contentCSS 属性渲染的,它只能与伪元素 :before:after.如果您有兴趣,可以在此处阅读它的工作原理.

You were on the right path except for the text that you mentioned aren't actually text. These texts are actually rendered by a CSS property called the content which can only be used with the pseudo-elements :before and :after. You can read here on how it works if you are interested.

文本呈现为图标;有时,组织会这样做,以避免合理的价值观被抹杀.但是,有一种方法(有点困难)可以解决这个问题.使用 Seleniumjavascript,您可以单独定位属性 contentCSS 值,其中包含您的值之后.

The text are rendered as icons; this is sometimes done by organizations to avoid sensible values being scraped. However, there is a way(somewhat hard) to get around this. Using Selenium and javascript you can individually target the CSS values of the property content in which it holds the values you are after.

研究了一个小时,这是获取所需值的最简单的pythonic方法

Having looked into it for an hour this is simplest pythonic way of getting the values you desire

overall_score = driver.execute_script("return [...document.querySelectorAll('.q-score-bar__grade-label')].map(div => window.getComputedStyle(div,':before').content)") #key line in the problem

代码简单地创建了一个 javascript 代码,它以元素的 classes 为目标,然后将 div 元素映射到 div 元素的值code>CSS 属性.这将返回一个列表

The code simply creates a javascript code that targets the classes of the elements and then maps the div elements to the values of the CSS properties. This returns a list

['"TOP BUY"', '"B"', '"B"', '"B"', '"A"']

值,按以下顺序对应

Q-Factor Score/质量/动力/增长/技术

要访问列表的值,您可以使用 for 循环和 indexing 来选择值.您可以在此处了解更多信息

To access the values of a list you can use a for loop and indexing to select the value. You can see more on that here

这篇关于Python - 从 Selenium 中的 ::before 伪元素上的 CSS 属性“content"获取文本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆