使用Python BeautifulSoup查找页面数 [英] Finding number of pages using Python BeautifulSoup
问题描述
我想从蒸汽页面中提取总页数(在这种情况下为11).我相信下面的代码应该可以工作(返回11),但是它返回的是一个空列表.就像没有找到 paged_items_paging_pagelink
类一样.
I want to extract the total page number (11 in this case) from a steam page. I believe that the following code should work (return 11), but it is returning an empty list. Like if it is not finding paged_items_paging_pagelink
class.
import requests
import re
from bs4 import BeautifulSoup
r = requests.get('http://store.steampowered.com/tags/en-us/RPG/')
c = r.content
soup = BeautifulSoup(c, 'html.parser')
total_pages = soup.find_all("span",{"class":"paged_items_paging_pagelink"})[-1].text
推荐答案
如果检查页面源,则所需的内容不可用.这意味着它是通过Javascript动态生成的.
If you check the page source, the content you want is not available. It means that it is generated dynamically through Javascript.
页码位于< span id ="NewReleases_links">
标记内,但是在页面源代码中,HTML仅显示以下内容:
The page numbers are located inside the <span id="NewReleases_links">
tag, but in the page source the HTML shows only this:
<span id="NewReleases_links"></span>
处理此问题的最简单方法是使用硒.
Easiest way to handle this is using Selenium.
但是,如果您查看页面源代码,则文本显示213个结果中的1-20
可用.因此,您可以抓取并计算页面数.
But, if you look at the page source, the text Showing 1-20 of 213 results
is available. So, you can scrape this and calculate the number of pages.
所需的HTML:
<div class="paged_items_paging_summary ellipsis">
Showing
<span id="NewReleases_start">1</span>
-
<span id="NewReleases_end">20</span>
of
<span id="NewReleases_total">213</span>
results
</div>
代码:
import requests
from bs4 import BeautifulSoup
r = requests.get('http://store.steampowered.com/tags/en-us/RPG/')
soup = BeautifulSoup(r.text, 'lxml')
def get_pages_no(soup):
total_items = int(soup.find('span', id='NewReleases_total').text)
items_per_page = int(soup.find('span', id='NewReleases_end').text)
return round(total_items/items_per_page)
print(get_pages_no(soup))
# prints 11
(注意:我仍然建议使用Selenium,因为该站点上的大多数内容都是动态生成的.像这样刮擦所有数据会很痛苦.)
(Note: I still recommend the use of Selenium, as most of the content from this site is dynamically generated. It'll be a pain to scrape all the data like this.)
这篇关于使用Python BeautifulSoup查找页面数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!