为什么BeautifulSoup在搜索结果网站上返回空列表? [英] Why does BeautifulSoup return empty list on search results websites?

查看：302 发布时间：2020/9/20 7:26:32 python web-scraping beautifulsoup python-requests

本文介绍了为什么BeautifulSoup在搜索结果网站上返回空列表?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我希望在线获取特定文章的价格，但似乎无法在标签下获得该元素，但是我可以在该网站的另一个(不同)站点上进行购买.在这个特定的网站上，我只会得到一个空列表.打印汤.文本也可以.我不想使用Selenium，因为我想了解BS4在这种情况下的工作原理.

I'm looking to get the price of a specific article online and I cannot seem to get the element under a tag, but I could do it on another (different) site of the website. In this particular site, I only get an empty list. Printing soup.text also works. I don't want to use Selenium if possible, as I'm looking to understand how BS4 works for this kind of cases.

import requests
from bs4 import BeautifulSoup
url = 'https://reverb.com/p/electro-harmonix-oceans-11-reverb-2018'

r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
cards = soup.select(".product-row-card")
print (cards)
>>>[]

我想得到的是网站上卡的名称和价格.我之前也遇到过这个问题，但是这里的每个解决方案都只建议使用Selenium(我可以这样做)，但我不知道为什么.我觉得它不那么实用.

What I would like to get is the name and price of the cards in the website. I also had this problem before, but every solution here only suggests using Selenium (which I could make work) but I don't know why. I find it even less practical.

此外，当我读到该网站正在使用javascript来获取此结果时，还有机会.如果是这样，为什么我要在 https://reverb.com中获取数据/price-guide/effects-and-pedals ，但不在这里吗?在这种情况下，硒将是唯一的解决方案吗?

Also, is there a chance as I read that the website is using javascript to fetch this results. If that was the case, why could I fetch the data in https://reverb.com/price-guide/effects-and-pedals but not here? Would Selenium be the only solution in that case?

推荐答案

您正确的是，您要定位的网站依赖javascript来呈现您要获取的数据.问题是requests无法评估javascript.

You are correct that the site you're targeting relies on javascript to render the data you're trying to obtain. The issue is requests does not evaluate javascript.

您还正确地说，Selenium WebDriver在这些情况下经常被使用，因为它驱动一个真实的，功能完善的浏览器实例.但这不是唯一的选择，因为 requests-html 具有javascript支持，对于简单来说可能不那么麻烦抓取.

You're also correct that Selenium WebDriver is often utilized in these situations, as it drives a real, full-blown browser instance. But it's not the only option, as requests-html has javascript support and is perhaps less cumbersome for simple scraping.

作为入门的示例，以下内容获取您正在访问的网站上前五个项目的标题和价格:

As an example to get you started, the following gets the title and price of the first five items on the site you're accessing:

from requests_html import HTMLSession
from bs4 import BeautifulSoup

session = HTMLSession()
r = session.get("https://reverb.com/p/electro-harmonix-oceans-11-reverb-2018")
r.html.render(sleep=5)

soup = BeautifulSoup(r.html.raw_html, "html.parser")
for item in soup.select(".product-row-card", limit=5):
    title = item.select_one(".product-row-card__title__text").text.strip()
    price = item.select_one(".product-row-card__price__base").text.strip()
    print(f"{title}: {price}")

结果:


Electro-Harmonix EHX Oceans 11 Eleven Reverb Hall Spring Guitar Effects Pedal: $119.98
Electro-Harmonix Oceans 11 Reverb - Used: $119.99
Electro-Harmonix Oceans 11 Multifunction Digital Reverb Effects Pedal: $122
Pre-Owned Electro-Harmonix Oceans 11 Reverb Multi Effects Pedal Used: $142.27
Electro-Harmonix Oceans 11 Reverb Matte Black: $110

这篇关于为什么BeautifulSoup在搜索结果网站上返回空列表?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为什么BeautifulSoup在搜索结果网站上返回空列表? [英] Why does BeautifulSoup return empty list on search results websites?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

为什么BeautifulSoup在搜索结果网站上返回空列表? [英] Why does BeautifulSoup return empty list on search results websites?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭