Python Selenium获取所有"href"属性 [英] Python Selenium Get All "href" attributes

查看:545
本文介绍了Python Selenium获取所有"href"属性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在此页面?

<h2 class="entry-title">
<a href="http://www.allitebooks.com/deep-learning-with-python-2/" rel="bookmark">Deep Learning with Python</a>
</h2>

我尝试过的没有得到href的是

What I have tried doesn't get the href, is:

title = driver.find_elements_by_class_name('entry-title')
title[0].get_attribute('href')

这没有获得"a"标签的链接.如果我在"a"标签上找到了所有元素,它将返回页面上的每个href(这不是我想要的).我只想返回上述标题,但能够获取其url"href"属性.

This did not get the links for "a" tag. And if I do a find all elements on "a" tag, it will return every href on the page (which isn't what I wanted). I want to return just the titles as above but be able to get their url "href" attributes.

推荐答案

此处代码从所有页面获取所有图书:

Here code getting all books from all pages:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
baseUrl = "http://www.allitebooks.com/page/1/?s=python"
driver.get(baseUrl)

# wait = WebDriverWait(driver, 5)
# wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".search-result-list li")))

# Get last page number
lastPage = int(driver.find_element(By.CSS_SELECTOR, ".pagination a:last-child").text)

# Get all HREFs for the first page and save them in hrefs list
js = 'return [...document.querySelectorAll(".entry-title a")].map(e=>e.href)'
hrefs = driver.execute_script(js)

# Iterate throw all pages and get all HREFs of books
for i in range(2, lastPage):
    driver.get("http://www.allitebooks.com/page/" + str(i) + "/?s=python")
    hrefs.extend(driver.execute_script(js))

for href in hrefs:
    print(href)

这篇关于Python Selenium获取所有"href"属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆