使用Python BeautifulSoup查找页面数 [英] Finding number of pages using Python BeautifulSoup

查看:56
本文介绍了使用Python BeautifulSoup查找页面数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从蒸汽页面中提取总页数(在这种情况下为11).我相信下面的代码应该可以工作(返回11),但是它返回的是一个空列表.就像没有找到 paged_items_paging_pagelink 类一样.

I want to extract the total page number (11 in this case) from a steam page. I believe that the following code should work (return 11), but it is returning an empty list. Like if it is not finding paged_items_paging_pagelink class.

import requests
import re
from bs4 import BeautifulSoup
r = requests.get('http://store.steampowered.com/tags/en-us/RPG/')
c = r.content
soup = BeautifulSoup(c, 'html.parser')


total_pages = soup.find_all("span",{"class":"paged_items_paging_pagelink"})[-1].text

推荐答案

如果检查页面源,则所需的内容不可用.这意味着它是通过Javascript动态生成的.

If you check the page source, the content you want is not available. It means that it is generated dynamically through Javascript.

页码位于< span id ="NewReleases_links"> 标记内,但是在页面源代码中,HTML仅显示以下内容:

The page numbers are located inside the <span id="NewReleases_links"> tag, but in the page source the HTML shows only this:

<span id="NewReleases_links"></span>

处理此问题的最简单方法是使用.

Easiest way to handle this is using Selenium.

但是,如果您查看页面源代码,则文本显示213个结果中的1-20可用.因此,您可以抓取并计算页面数.

But, if you look at the page source, the text Showing 1-20 of 213 results is available. So, you can scrape this and calculate the number of pages.

所需的HTML:

<div class="paged_items_paging_summary ellipsis">
    Showing 
    <span id="NewReleases_start">1</span>
    -
    <span id="NewReleases_end">20</span> 
    of 
    <span id="NewReleases_total">213</span> 
    results         
</div>

代码:

import requests
from bs4 import BeautifulSoup

r = requests.get('http://store.steampowered.com/tags/en-us/RPG/')
soup = BeautifulSoup(r.text, 'lxml')

def get_pages_no(soup):
    total_items = int(soup.find('span', id='NewReleases_total').text)
    items_per_page = int(soup.find('span', id='NewReleases_end').text)
    return round(total_items/items_per_page)

print(get_pages_no(soup))
# prints 11

(注意:我仍然建议使用Selenium,因为该站点上的大多数内容都是动态生成的.像这样刮擦所有数据会很痛苦.)

(Note: I still recommend the use of Selenium, as most of the content from this site is dynamically generated. It'll be a pain to scrape all the data like this.)

这篇关于使用Python BeautifulSoup查找页面数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆