Python Beautiful Soup只抓取页面的下部 [英] Python Beautiful Soup only scraping the lower part of the page

查看:144
本文介绍了Python Beautiful Soup只抓取页面的下部的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正试图从一个相当大的服装网站页面上获取产品信息,但是汤只出现了任意截断的情况,只能刮掉html文档的下半部分,因此我实际上不感兴趣的数据在我的汤里我已经在另一个网站上尝试过,并且效果很好,所以我认为它是特定于网站的.

I'm trying to pull product info from a rather large apparel website page, but soup only appears to scrape a lower half of the html document, at an arbitrary cutoff, so the data that I'm interested in is not actually in my soup. I've tried it on another website and it worked fine, so I assume it's website specific.

这是我的代码:

from bs4 import BeautifulSoup
import requests

r = requests.get("https://www.pullandbear.com/rs/man/sale-c1030036006.html")
soup = BeautifulSoup(r.content, "html.parser")
print(soup.prettify())

推荐答案

如注释之一所述,您尝试获取的HTML随浏览器上运行的JavaScript添加.

As stated in one of the comments the HTML you are trying to get is added with JavaScript running on the browser.

我建议您此软件包请求-HTML 由非常受欢迎的请求.

I recommend you this package Requests-HTML created by the author of very popular requests.

from requests_html import HTMLSession

session = HTMLSession()
r = session.get('https://www.pullandbear.com/rs/man/sale-c1030036006.html')
r.html.render()

print(r.html.html)

这篇关于Python Beautiful Soup只抓取页面的下部的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆