(Python)尝试在初始加载后更新的网页上使用beautifulsoup进行解析 [英] (Python) Trying to parse with beautifulsoup on a webpage that updates after its initial load

查看:50
本文介绍了(Python)尝试在初始加载后更新的网页上使用beautifulsoup进行解析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

例如,如果您去这里: https://www.basspro.com/shop/zh/herters-hunting-rifle-ammo/

For example, if you go here: https://www.basspro.com/shop/en/herters-hunting-rifle-ammo/

您会在第一次加载时注意到,它将显示所有库存.然后,页面将再次更新并显示所有缺货的商品.

you'll notice on the first load, it'll show everything in stock. Then, the page will update again and show all the things which are out of stock.

有什么办法可以使用beautifulsoup来解决这个问题吗?我开始认为我将需要使用其他策略来提取更新的HTML代码.

Is there any way to use beautifulsoup to account for this? I'm starting to think I'm going to need to use a different strategy to pull the updated HTML code.

按现状,我的代码没有返回任何内容,因为没有缺货"的提示.Beautifulsoup提取的代码中的文本.

As it stands, my code returns nothing because there is no "out of stock" text in the code that beautifulsoup pulls.

content_wrapper = soup.find('div', class_='col2 gridCell StoreAvail editable anchored', id='StoreAvail_7')
cheese = content_wrapper.find('div', class_='sublist instore_inventory_section nodisplay',
                              id='WC_InStore_Inventory_Section_3074457345618960372')

print(cheese)

感谢阅读.

推荐答案

您要抓取的网站不是在服务器端呈现的,而是在客户端呈现的,可能带有某些Javascript库/框架,例如React.js或Angular

The site you're crawling is not server-side-rendered but rendered on the client side, possibly with some Javascript Library/Framework like React.js or Angular.

如果您想抓取这样的网站,则需要使用无头浏览器.最受欢迎的无头浏览器是 Puppeteer,并且还有一个用于Python的端口.

You need to use a headless browser if you like to scrape a website like this. The most popular headless browser is Puppeteer and there is a port for Python as well.

Puppeteer启动一个真实的Chrome实例,从而解析/渲染该站点上所有Javascript驱动的内容.显然,这需要更长的时间.

Puppeteer spins up a real chromium instance and thus parses/renders all the Javascript driven content on the site. Obviously it takes a little longer.

这篇关于(Python)尝试在初始加载后更新的网页上使用beautifulsoup进行解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆