使用Python BeautifulSoup查找页面数 [英] Finding number of pages using Python BeautifulSoup

查看：56 发布时间：2021/4/15 19:11:03 python web-scraping beautifulsoup

本文介绍了使用Python BeautifulSoup查找页面数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想从蒸汽页面中提取总页数(在这种情况下为11).我相信下面的代码应该可以工作(返回11)，但是它返回的是一个空列表.就像没有找到 paged_items_paging_pagelink 类一样.

I want to extract the total page number (11 in this case) from a steam page. I believe that the following code should work (return 11), but it is returning an empty list. Like if it is not finding paged_items_paging_pagelink class.

import requests
import re
from bs4 import BeautifulSoup
r = requests.get('http://store.steampowered.com/tags/en-us/RPG/')
c = r.content
soup = BeautifulSoup(c, 'html.parser')


total_pages = soup.find_all("span",{"class":"paged_items_paging_pagelink"})[-1].text

推荐答案

如果检查页面源，则所需的内容不可用.这意味着它是通过Javascript动态生成的.

If you check the page source, the content you want is not available. It means that it is generated dynamically through Javascript.

页码位于< span id ="NewReleases_links"> 标记内，但是在页面源代码中，HTML仅显示以下内容:

The page numbers are located inside the <span id="NewReleases_links"> tag, but in the page source the HTML shows only this:

<span id="NewReleases_links"></span>

处理此问题的最简单方法是使用硒.

Easiest way to handle this is using Selenium.

但是，如果您查看页面源代码，则文本显示213个结果中的1-20可用.因此，您可以抓取并计算页面数.

But, if you look at the page source, the text Showing 1-20 of 213 results is available. So, you can scrape this and calculate the number of pages.

所需的HTML:

<div class="paged_items_paging_summary ellipsis">
    Showing 
    <span id="NewReleases_start">1</span>
    -
    <span id="NewReleases_end">20</span> 
    of 
    <span id="NewReleases_total">213</span> 
    results         
</div>

代码:

import requests
from bs4 import BeautifulSoup

r = requests.get('http://store.steampowered.com/tags/en-us/RPG/')
soup = BeautifulSoup(r.text, 'lxml')

def get_pages_no(soup):
    total_items = int(soup.find('span', id='NewReleases_total').text)
    items_per_page = int(soup.find('span', id='NewReleases_end').text)
    return round(total_items/items_per_page)

print(get_pages_no(soup))
# prints 11

(注意:我仍然建议使用Selenium，因为该站点上的大多数内容都是动态生成的.像这样刮擦所有数据会很痛苦.)

(Note: I still recommend the use of Selenium, as most of the content from this site is dynamically generated. It'll be a pain to scrape all the data like this.)

这篇关于使用Python BeautifulSoup查找页面数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用Python BeautifulSoup查找页面数 [英] Finding number of pages using Python BeautifulSoup

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用Python BeautifulSoup查找页面数 [英] Finding number of pages using Python BeautifulSoup

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭