在python中使用硒导航 [英] Navigation using selenium in python

查看:97
本文介绍了在python中使用硒导航的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Python和Selenium抓取此网站.但目前它只刮取7月份的前10页,将下一个按钮的上一个同级的页码转换为int,然后单击下一个number_of_pages-1,但是在到达第10页后将停止.

I'm scraping this website using Python and Selenium. But it currently only scrapes the first 10 page for the month of July, it turns the page number of the previous sibling of the next button into int and clicks next number_of_pages - 1 however after it gets to page 10 it stops.

URL- https://planning.adur -worthing.gov.uk/online-applications/search.do?action=monthlyList

有人可以帮我刮掉所有页面吗?

Can anyone help me to get it to scrape all the pages?

def pagination( driver ):
   data = []
   last_element = driver.find_element_by_xpath('//a[ contains( concat( " ", normalize-space( @class ), " "), " next ") ]/preceding-sibling::a[1]')
   if last_element is None:
    number_of_pages = 1
else:
    number_of_pages = int( last_element.text )
# data = [ getData( driver ) ]
data.extend(getData(driver))
for i in range(number_of_pages - 1):
    driver.find_element_by_xpath('//a[ contains( concat( " ", normalize-space( @class ), " "), " next ") ]').click()
    data.extend( getData( driver ) )
    time.sleep(1)
return data

推荐答案

看,我了解您采用了从我的

Look, I understand you took the idea of calculating the total number of pages from my answer for a previous question of yours. In the previous case since the last page number was directly available to us, it worked but that's not the case here.

解决方案:

虽然页数不是直接可用的,但条目总数为-

Although the number of pages is not directly available but the total number of entries is -

现在,如上图所示,对于7月的情况,此数字为174.假设将分页长度(单页中的条目数)设置为默认值10,则页数应为18 (17页,每页10个条目,另外一页则剩余4个条目).

Now, as you can see in the above screenshot for the case of July this number is 174. Assuming you put the pagination length(the number of entries in a single page) as default 10, the number of pages should be 18 (17 pages of 10 entries each and one extra page for remaining 4 entries).

因此,计算页数的逻辑应该很简单.如果您以某种方式在total_entries变量中获得了总条目数,则页面数应为(取自

So, the logic of calculating the number of pages should be simple. If you somehow got this total number of entries in total_entries variable, the number of pages should be(taken from this:

number_of_pages = (total_entries/10) + 1

默认情况下,Python通过除法运算符返回下界整数,因此174/10将返回17,添加+1将返回18.这样就可以了-页数为18.

Python by default returns the lower bound integer by division operator so 174/10 will return 17 and adding +1 will return 18. So there you have it- 18 as the number of pages.

现在,提取条目总数.您可以使用下面的定位器找到保存该元素的<span>元素.

Now, to extract the total number of entries. You use the below locator to find the <span> element holding that.

driver.find_element_by_xpath('//span[@class='showing']')

但是该元素包含这样的文本-Showing 1-10 of 174.您只需要整个字符串中的174部分.为此,首先提取"of"之后的字符串,然后将其转换为int.

But this element contains text like this - Showing 1-10 of 174. You need only the 174 part from the entire string. To do that, first you extract the string after "of" and then convert it into int.

从文本中提取条目总数作为int的算法:

showing_text = driver.find_element_by_xpath("//span[@class='showing']").text    #Showing 1-10 of 174
number_of_entries_text = showing_text.split("of",1)[1]        # 174 as text
number_of_entries = int( re.findall(r'\d+',number_of_entries_text)[0])  #174 as int
number_of_pages = (number_of_entries/10) + 1   #18

最终代码:

def pagination( driver ):
   data = []
   last_element = driver.find_element_by_xpath("//span[@class='showing']")
   if last_element is None:
      number_of_pages = 1
   else:
      showing_text = driver.find_element_by_xpath("//span[@class='showing']").text              number_of_entries_text = showing_text.split("of",1)[1]        
      number_of_entries = int( re.findall(r'\d+',number_of_entries_text)[0])  
      number_of_pages = (number_of_entries/10) +1   

   for i in range(number_of_pages - 1):
       driver.find_element_by_xpath('//a[ contains( concat( " ", normalize-space( @class ), " "), " next ") ]').click()
       time.sleep(1)

注意:

我认为我的解决方案更好,因为您不必反复检查任何元素是否可用或捕获任何异常.您只需直接获得页面数,然后单击多次next按钮即可.

I think my solution is better since you don't have to repeatedly check for any element to be available or to catch any exceptions. You just directly get the number of pages and you click the next button that many times.

这篇关于在python中使用硒导航的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆