如何从< a>中提取所有文本通过Python使用Selenium标记 [英] How to extract all the texts from <a> tag using Selenium through Python
问题描述
这是我要从中提取数据的网站的链接,
我正在尝试在锚标签下获取href
属性的所有文本.
这是示例html:
Here is the link of website from where I want to extract data,
I'm trying to get all text of href
attribute under anchor tag.
Here is the sample html:
<div id="borderForGrid" class="border">
<h5 class="">
<a href="/products/product-details/?prod=30AD">A/D TC-55 SEALER</a>
</h5>
<div id="borderForGrid" class="border">
<h5 class="">
<a href="/products/product-details/?prod=P380">Carbocrylic 3356-1</a>
</h5>
我想提取所有文本值,例如['A/D TC-55 SEALER','Carbocrylic 3356-1']
.
我尝试过:
I want to extract all text values like ['A/D TC-55 SEALER','Carbocrylic 3356-1']
.
I tried with:
target = driver.find_element_by_class_name('border')
anchorElement = target.find_element_by_tag_name('a')
anchorElement.text
但它给出''
(空)字符串.
but it gives ''
(empty) string.
关于如何实现的任何建议?
Any suggestion on how can it be achieved?
PS-在产品类型
推荐答案
要提取<a>
标记内的所有文本值,例如 ['A/D TC-55 SEALER','Carbocrylic 3356-1'] ,您必须为visibility_of_all_elements_located()
引入 WebDriverWait ,并且您可以使用以下解决方案:
To extract all the text values within the <a>
tags e.g. ['A/D TC-55 SEALER','Carbocrylic 3356-1'], you have to induce WebDriverWait for the visibility_of_all_elements_located()
and you can use either of the following solutions:
-
使用
CSS_SELECTOR
:
print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "li.topLevel[data-types='Acrylics'] h5>a[href^='/products/product-details/?prod=']")))])
使用XPATH
:
print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//li[@class='topLevel' and @data-types='Acrylics']//h5[@class]/a[starts-with(@href, '/products/product-details/?prod=')]")))])
注意:您必须添加以下导入:
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
这篇关于如何从< a>中提取所有文本通过Python使用Selenium标记的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!