如何从< a>中提取所有文本通过Python使用Selenium标记 [英] How to extract all the texts from <a> tag using Selenium through Python

查看:193
本文介绍了如何从< a>中提取所有文本通过Python使用Selenium标记的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我要从中提取数据的网站的链接, 我正在尝试在锚标签下获取href属性的所有文本. 这是示例html:

Here is the link of website from where I want to extract data, I'm trying to get all text of href attribute under anchor tag. Here is the sample html:

<div id="borderForGrid" class="border">
  <h5 class="">
    <a href="/products/product-details/?prod=30AD">A/D TC-55 SEALER</a>
  </h5>

<div id="borderForGrid" class="border">
  <h5 class="">
    <a href="/products/product-details/?prod=P380">Carbocrylic 3356-1</a>
 </h5>

我想提取所有文本值,例如['A/D TC-55 SEALER','Carbocrylic 3356-1'].
我尝试过:

I want to extract all text values like ['A/D TC-55 SEALER','Carbocrylic 3356-1'].
I tried with:

target = driver.find_element_by_class_name('border')
anchorElement = target.find_element_by_tag_name('a')
anchorElement.text

但它给出''(空)字符串.

but it gives '' (empty) string.

关于如何实现的任何建议?

Any suggestion on how can it be achieved?

PS-在产品类型

推荐答案

要提取<a>标记内的所有文本值,例如 ['A/D TC-55 SEALER','Carbocrylic 3356-1'] ,您必须为visibility_of_all_elements_located()引入 WebDriverWait ,并且您可以使用以下解决方案:

To extract all the text values within the <a> tags e.g. ['A/D TC-55 SEALER','Carbocrylic 3356-1'], you have to induce WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following solutions:

  • 使用CSS_SELECTOR:

print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "li.topLevel[data-types='Acrylics'] h5>a[href^='/products/product-details/?prod=']")))])

  • 使用XPATH:

    print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//li[@class='topLevel' and @data-types='Acrylics']//h5[@class]/a[starts-with(@href, '/products/product-details/?prod=')]")))])
    

  • 注意:您必须添加以下导入:

  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

  • 这篇关于如何从&lt; a&gt;中提取所有文本通过Python使用Selenium标记的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    相关文章
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆