从 aria-label selenium webdriver (python) 中提取文本 [英] Extract text from an aria-label selenium webdriver (python)

查看:36
本文介绍了从 aria-label selenium webdriver (python) 中提取文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

现在我正在开发一个程序,该程序接受用户输入的问题和答案,将它们分成单独的 q 和 a 列表,然后根据问题或答案自动回答问题.由于使用bot"的地方是在线的,我使用的是 Selenium Web 驱动程序,这在我尝试读取 aria-label 时导致了一些问题.我不知道我做错了什么,因为我对 selenium、HTML 或 CSS 一点也不高级.我试图在不知道它是什么的情况下找到每个容器的 aria-label 值

我试图获取以下文本值的 HTML 示例:

我的代码片段:

def driver():驱动程序 = webdriver.Chrome()driver.get(链接)startMatch = driver.find_element_by_xpath("/html/body/div[5]/div/div/div/div[2]/button").click()#在匹配中查找文本container = driver.find_elements_by_class_name('MatchModeQuestionGridTile-content')containerFile = open("QuizletTerms.txt", "w+")对于列表中的 _(容器):arialabel = driver.find_elements_by_css_selector("div[aria-label='']")containerFile.write("\n")containerFile.write(str(arialabel))打印(arialabel)containerFile.close()打印(完成")睡觉(5)

输出:

<预><代码>[][][][][][][][][][][][]

解决方案

文本例如伪装;隐瞒真相;提供蹩脚的借口存在于子

以及它的父

中.所以提取它你需要为 visibility_of_all_elements_located() 并且您可以使用以下任一定位器策略:

  • 使用 CSS_SELECTORget_attribute():

    print([my_elem.get_attribute(aria-label") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.MatchModeQuestionGridTile-content>div)[咏叹调标签]")))])

  • 使用 XPATHget_attribute():

    print([my_elem.get_attribute(aria-label") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='MatchModeQuestionGridTile-content']/div[@aria-label]")))])

  • 注意:您必须添加以下导入:

    from selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.common.by import By从 selenium.webdriver.support 导入 expected_conditions 作为 EC

Right now I'm working on a program that takes user input of questions and answers, separates them into separate lists of q's and a's, then automatically answers the question given either the question or answer. Since the place where the 'bot' is being used is online, I'm using the Selenium web driver, which is causing me some problems when trying to read an aria-label. I don't know what I'm doing wrong, as I'm not advanced at all with selenium, HTML, or CSS. I'm trying to find the aria-label value for each container without knowing what it is

An example of the HTML I'm trying to get the text value of:

<div class="MatchModeQuestionGridBoard-tile"><div class="MatchModeQuestionGridTile" touch-action="auto"><div class="MatchModeQuestionGridTile-content"><div aria-label="to cloak; to conceal the truth; to offer lame excuses" class="FormattedText notranslate TermText MatchModeQuestionGridTile-text lang-en" style="font-size: 14px;"><div style="display: block;">to cloak; to conceal the truth; to offer lame excuses</div></div></div></div></div>

Snippet of my code:

def driver():
    driver = webdriver.Chrome()
    driver.get(link)
    startMatch = driver.find_element_by_xpath("/html/body/div[5]/div/div/div/div[2]/button").click()
   
    #find text in matches
    container = driver.find_elements_by_class_name('MatchModeQuestionGridTile-content')
    containerFile = open("QuizletTerms.txt", "w+")

    for _ in list(container):
        arialabel = driver.find_elements_by_css_selector("div[aria-label='']")
        containerFile.write("\n")
        containerFile.write(str(arialabel))
        print(arialabel)

    containerFile.close()
    print("done")
    sleep(5)

Output:


[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]

解决方案

The texts e.g. to cloak; to conceal the truth; to offer lame excuses is present in child <div> as well in it's parent <div>. So extract it you need to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following Locator Strategies:

  • Using CSS_SELECTOR and get_attribute():

    print([my_elem.get_attribute("aria-label") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.MatchModeQuestionGridTile-content>div[aria-label]")))])
    

  • Using XPATH and get_attribute():

    print([my_elem.get_attribute("aria-label") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='MatchModeQuestionGridTile-content']/div[@aria-label]")))])
    

  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

这篇关于从 aria-label selenium webdriver (python) 中提取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆