如何在python中解析具有相同类名的网站的几个属性? [英] How to parse several attributes of website with same class name in python?

查看:27
本文介绍了如何在python中解析具有相同类名的网站的几个属性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从这个网站解析地址 (https://www.conad.it/)在搜索栏中搜索 CAP 并输入结果后,使用 Pyhthon.对于许多 CAP 来说,结果有很多存储地址,我想抓取所有这些地址,而不仅仅是第一个(这是我的代码现在正在做的).

I want to parse addresses from this website (https://www.conad.it/) with Pyhthon after having searched fro a CAP in the search bar and entered the result. For many CAP's there are many addresses of stores that result and I want to scrape all of them, not just the first one (which is what my code is now doing).

这是我目前的代码:

driver = webdriver.Chrome('pathtoChrome/chromedriver.exe')
driver.get("https://www.conad.it/")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[@href='javascript:void(0)']"))).click() # accept the cookies
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[@id='location-input']"))).send_keys("11100")
driver.find_element_by_xpath("//input[@class = 'btn btn-default btn-lg btn-block']").click()
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[contains(@class,'col-md-8')]"))).get_attribute("innerHTML"))

将其作为最终输出:

<h3>Conad</h3><p>Frazione Condemine 84, 11010  Sarre</p><div class="extra-services extra-services-buttons extra-services-desktop extra-services-simple"><ul class="carousel-services"></ul></div>

我只想要上层输出中 <p> 中的输出,但是对于 'col-md-8 类中的所有属性,所以对于这个CAP 示例也用于第二个地址.

I would want only the output within the <p> in the upper output but for all attributes within the class 'col-md-8, so for this example of CAP also for the second address.

最理想的情况是,我想将它存储在一个数据集中,我可以将其附加到不同 CAP 的多个循环中,因此类似这样的事情(目前还行不通..):

Optimally I want to store it in a data set which I can append over several loops of different CAP's, so something like this (which doesn't work yet..):

driver = webdriver.Chrome('pathtoChrome/chromedriver.exe')
driver.get("https://www.conad.it/")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[@href='javascript:void(0)']"))).click() # accept the cookies
CAPS = ['11100']
for CAP in CAPS:
   WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[@id='location-input']"))).send_keys(CAP)
   driver.find_element_by_xpath("//input[@class = 'btn btn-default btn-lg btn-block']").click()
   print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[contains(@class,'col-md-8')]"))).get_attribute("innerHTML"))

感谢任何帮助!

推荐答案

您可以使用 WebDriverWait() 并等待 visibility_of_all_elements_located() 和以下 xpath 选项以获取列表中的所有 p 标记值.

You can use WebDriverWait() and wait for visibility_of_all_elements_located() and following xpath option to get all p tag value in a list.

print([item.text for item in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[contains(@class,'col-md-8')]//p")))])

你的输出就像一个列表.

Your output would be like a list.

['Frazione Condemine 84, 11010 Sarre', 'Grand Chemin C/c Centreville 3, 11020 Saint-christophe', "Localita' Arensod 27, 11010 Sarre"]

这篇关于如何在python中解析具有相同类名的网站的几个属性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆