Selenium Python - 访问搜索结果的下一页 [英] Selenium Python - Access next pages of search results

查看:30
本文介绍了Selenium Python - 访问搜索结果的下一页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须从这个网址一个一个点击每个搜索结果:

I have to click on each search result one by one from this url:

搜索指南

我首先从显示的文本中提取结果总数,以便设置迭代上限

I first extract the total number of results from the displayed text so that I can set the upper limit for iteration

upperlimit=driver.find_element_by_id("total_results")
number = int(upperlimit.text.split(' ')[0])

然后将循环定义为对于 i in range(1,number):

The loop is then defiend as for i in range(1,number):

但是,在浏览第一页的前 10 个结果后,列表索引超出范围(可能是因为没有更多链接可供点击).我需要单击下一步"以获取接下来的 10 个结果,依此类推,直到我完成所有搜索结果.我该怎么做?

However, after going through the first 10 results on the first page, list index goes out of range (probably because there are no more links to click). I need to click on "Next" to get the next 10 results, and so on till I'm done with all search results. How can I go around doing that?

任何帮助将不胜感激!

推荐答案

问题是页面加载后id为total_results的元素的值发生了变化,起初包含117,然后更改为 44.

The problem is that the value of element with id total_results changes after the page is loaded, at first it contains 117, then changes to 44.

相反,这里有一个更强大的方法.它一页一页地处理,直到没有更多页为止:

Instead, here is a more robust approach. It processes page by page until there is no more pages left:

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException

driver = webdriver.Firefox()
url = 'http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true#/search/?searchText=bevacizumab&mode=&staticTitle=false&SEARCHTYPE_all2=true&SEARCHTYPE_all1=&SEARCHTYPE=GUIDANCE&TOPICLVL0_all2=true&TOPICLVL0_all1=&HIDEFILTER=TOPICLVL1&HIDEFILTER=TOPICLVL2&TREATMENTS_all2=true&TREATMENTS_all1=&GUIDANCETYPE_all2=true&GUIDANCETYPE_all1=&STATUS_all2=true&STATUS_all1=&HIDEFILTER=EGAPREFERENCE&HIDEFILTER=TOPICLVL3&DATEFILTER_ALL=ALL&DATEFILTER_PREV=ALL&custom_date_from=&custom_date_to=11-06-2014&PAGINATIONURL=%2FSearch.do%3FsearchText%40%40bevacizumab%26newsearch%40%40true%26page%40%40&SORTORDER=BESTMATCH'
driver.get(url)

page_number = 1
while True:
    try:
        link = driver.find_element_by_link_text(str(page_number))
    except NoSuchElementException:
        break
    link.click()
    print driver.current_url
    page_number += 1

基本上,这里的想法是获取下一页链接,直到没有这样的链接(NoSuchElementException 将被抛出).请注意,它适用于任意数量的页面和结果.

Basically, the idea here is to get the next page link, until there is no such ( NoSuchElementException would be thrown). Note that it would work for any number of pages and results.

它打印:

http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true&page=1
http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true&page=2#showfilter
http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true&page=3#showfilter
http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true&page=4#showfilter
http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true&page=5#showfilter

这篇关于Selenium Python - 访问搜索结果的下一页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆