用硒导航分页 [英] Navigating pagination with selenium

查看:86
本文介绍了用硒导航分页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我陷入了一个奇怪的分页案.我正在从 https://cotthosting.com/NYRocklandExternal/LandRecords/protected/SrchQuickName抓取搜索结果.aspx

我的搜索结果分为4类.

1)没有搜索结果

2)只有一个结果页面

3)结果页多于一个,但结果页少于12个

4)有超过12个结果页面.

对于情况1,这很简单,我只是通过.

results = driver.find_element_by_class_name('GridView')
if len(results)== 0:
    pass

对于情况2和3,我正在检查包含元素中的链接列表是否至少一个,然后单击它.

else:
    results_table = bsObj.find('table', {'class':'GridView'})
    sub_tables = results_table.find_all('table')
    next_page_links = sub_tables[1].find_all('a')
    if len(next_page_links) == 0
        scrapeResults()
    else:
        scrapeResults()
        ####GO TO NEXT PAGE UNTIL THERE IS NO NEXT PAGE

对案例2和3的问题:我可能在这里检查什么作为控件?

链接是指向第2、3等页面的href.但是棘手的部分是,如果我在当前页面(例如第1页)上,那么如何确保我进入第2页以及何时进入第2页我如何确定要转到第3页?结果列表的第1页的html如下

<table cellspacing="0" cellpadding="0" border="0" style="border-collapse:collapse;">
   <tr>
      <td>Page: <span>1</span></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$2&#39;)">2</a></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$3&#39;)">3</a></td>
   </tr>
</table>

在情况2中,我可以使用sub_tables[1]来对此表清零,具体请参见上述bs4代码.

问题在于没有我可以利用的下一个按钮. html中的结果页面没有任何变化.除了链接前span中的数字之外,没有其他东西可以隔离当前页面.我希望它在到达最后一页时停止

对于情况4,html如下所示:

<table cellspacing="0" cellpadding="0" border="0" style="border-collapse:collapse;">
   <tr>
      <td>Page: <span>1</span></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$2&#39;)">2</a></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$3&#39;)">3</a></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$4&#39;)">4</a></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$5&#39;)">5</a></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$6&#39;)">6</a></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$7&#39;)">7</a></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$8&#39;)">8</a></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$9&#39;)">9</a></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$10&#39;)">10</a></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$11&#39;)">...</a></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$Last&#39;)">Last</a></td>
   </tr>
</table>

最后两个链接是...,表示有更多结果页面,而Last表示最后一页.但是,`最后一个链接存在于每个页面上,并且它仅在最后一页本身上不是活动链接.

问题4的问题,我如何检查last链接是否可单击并将其用作我的停留点?

案例4的更大问题,我如何操纵...浏览其他结果页面?结果页面列表最多为12个值.即距当前页面最近的十个页面,指向更多页面的...链接和Last链接.因此,如果我的结果有88页,我不知道该怎么办.

我将转储链接到完整的示例页面: https://ghostbin.com/paste/nrb27

解决方案

您应该做的是计算页面中的结果数,并使用总结果中的值除以估算页面总数.

如果您要检查页面,则会看到:`

Displaying records 1 - 500 of 32563 at 10:08 AM ET on 9/16/2016

了解页面总数,开始导航并检查页面是否已加载(如果需要),并知道当前页面,您可以基于2种情况为页面获取页面导航号的动态选择器:

  • 如果分页号码不是链接,则您在该页面上
  • 如果分页号码是链接,则可以使用它单击

您不需要4个类别,因为: -您可以计算结果数以及在一个页面上可以显示多少个结果 -了解页数

  1. 使用for或其他控件结构创建必要时进行导航的方法
  2. 对于每个导航,您需要做的事情

或者转到最后一页并向后开始,直到第1页不是链接为止.

I am getting stuck on a weird case of pagination. I am scraping search results from https://cotthosting.com/NYRocklandExternal/LandRecords/protected/SrchQuickName.aspx

I have search results that fall into 4 categories.

1) There are no search results

2) There is one results page

3) There is more than one results page but less than 12 results pages

4) There are more than 12 results pages.

For case 1, that is easy, I am just passing.

results = driver.find_element_by_class_name('GridView')
if len(results)== 0:
    pass

For cases 2 and 3, I am checking if the list of links in the containing element is at least one and then click it.

else:
    results_table = bsObj.find('table', {'class':'GridView'})
    sub_tables = results_table.find_all('table')
    next_page_links = sub_tables[1].find_all('a')
    if len(next_page_links) == 0
        scrapeResults()
    else:
        scrapeResults()
        ####GO TO NEXT PAGE UNTIL THERE IS NO NEXT PAGE

Question for case 2 and 3: What could i possibly check for here as my control?

The links are hrefs to pages 2, 3, etc. But the tricky part is if I am on a current page, say page 1, how do I make sure I a going to page 2 and when I am on page 2 how do i make sure I am going to page 3? The html for page 1 for the results list is as follows

<table cellspacing="0" cellpadding="0" border="0" style="border-collapse:collapse;">
   <tr>
      <td>Page: <span>1</span></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$2&#39;)">2</a></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$3&#39;)">3</a></td>
   </tr>
</table>

I can zero into this table specifically using sub_tables[1] see above bs4 code in case 2.

The problem is there is no next button that I could utilize. Nothing changes along the results pages in the html. There is nothing to isolate the current page besides the number in the span right before the links. And I would like it to stop when it reaches the last page

For case 4, the html looks like this:

<table cellspacing="0" cellpadding="0" border="0" style="border-collapse:collapse;">
   <tr>
      <td>Page: <span>1</span></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$2&#39;)">2</a></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$3&#39;)">3</a></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$4&#39;)">4</a></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$5&#39;)">5</a></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$6&#39;)">6</a></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$7&#39;)">7</a></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$8&#39;)">8</a></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$9&#39;)">9</a></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$10&#39;)">10</a></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$11&#39;)">...</a></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$Last&#39;)">Last</a></td>
   </tr>
</table>

The last two links are ... to show that there are more results pages and Last to signify the last page. However, the `Last link exists on every page and it is only on the last page itself that it is not an active link.

Question for case 4, how could i check if the last link is clickable and use this as my stopping point?

Bigger question for case 4, how do i manouver the ... to go through other results pages? The results page list is 12 values at most. i.e. the nearest ten pages to the current page, the ... link to more pages and the Last link. So i don't know what to do if my results have say 88 pages.

I am link a dump to a full sample page : https://ghostbin.com/paste/nrb27

解决方案

What you should do is to count the number of results in a page and use the value from total results to estimate the total number of pages by dividing.

If you will inspect the page you will see: `

Displaying records 1 - 500 of 32563 at 10:08 AM ET on 9/16/2016

Knowing the total number of the page, start navigation and check that page is loaded if needed and knowing the current page you could get a dynamic selector for the page navigation number based on the page for 2 cases:

  • if pagination number is not a link then you are on that page
  • if pagination number is a link you can use it to click

You should't need 4 categories since: - you can count the number of results and how many can be displayed on a page - know the number of pages

  1. Create a method to navigate if needed with a for or other control structure
  2. For each navigation do what you need to do

Or go to the last page and start backwards until page 1 is not a link.

这篇关于用硒导航分页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆