为什么BeautifulSoup在搜索结果网站上返回空列表? [英] Why does BeautifulSoup return empty list on search results websites?
问题描述
我希望在线获取特定文章的价格,但似乎无法在标签下获得该元素,但是我可以在该网站的另一个(不同)站点上进行购买.在这个特定的网站上,我只会得到一个空列表.打印汤.文本也可以.我不想使用Selenium,因为我想了解BS4在这种情况下的工作原理.
I'm looking to get the price of a specific article online and I cannot seem to get the element under a tag, but I could do it on another (different) site of the website. In this particular site, I only get an empty list. Printing soup.text also works. I don't want to use Selenium if possible, as I'm looking to understand how BS4 works for this kind of cases.
import requests
from bs4 import BeautifulSoup
url = 'https://reverb.com/p/electro-harmonix-oceans-11-reverb-2018'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
cards = soup.select(".product-row-card")
print (cards)
>>>[]
我想得到的是网站上卡的名称和价格.我之前也遇到过这个问题,但是这里的每个解决方案都只建议使用Selenium(我可以这样做),但我不知道为什么.我觉得它不那么实用.
What I would like to get is the name and price of the cards in the website. I also had this problem before, but every solution here only suggests using Selenium (which I could make work) but I don't know why. I find it even less practical.
此外,当我读到该网站正在使用javascript来获取此结果时,还有机会.如果是这样,为什么我要在 https://reverb.com中获取数据/price-guide/effects-and-pedals ,但不在这里吗?在这种情况下,硒将是唯一的解决方案吗?
Also, is there a chance as I read that the website is using javascript to fetch this results. If that was the case, why could I fetch the data in https://reverb.com/price-guide/effects-and-pedals but not here? Would Selenium be the only solution in that case?
推荐答案
您正确的是,您要定位的网站依赖javascript来呈现您要获取的数据.问题是requests
无法评估javascript.
You are correct that the site you're targeting relies on javascript to render the data you're trying to obtain. The issue is requests
does not evaluate javascript.
您还正确地说,Selenium WebDriver在这些情况下经常被使用,因为它驱动一个真实的,功能完善的浏览器实例.但这不是唯一的选择,因为 requests-html
具有javascript支持,对于简单来说可能不那么麻烦抓取.
You're also correct that Selenium WebDriver is often utilized in these situations, as it drives a real, full-blown browser instance. But it's not the only option, as requests-html
has javascript support and is perhaps less cumbersome for simple scraping.
作为入门的示例,以下内容获取您正在访问的网站上前五个项目的标题和价格:
As an example to get you started, the following gets the title and price of the first five items on the site you're accessing:
from requests_html import HTMLSession
from bs4 import BeautifulSoup
session = HTMLSession()
r = session.get("https://reverb.com/p/electro-harmonix-oceans-11-reverb-2018")
r.html.render(sleep=5)
soup = BeautifulSoup(r.html.raw_html, "html.parser")
for item in soup.select(".product-row-card", limit=5):
title = item.select_one(".product-row-card__title__text").text.strip()
price = item.select_one(".product-row-card__price__base").text.strip()
print(f"{title}: {price}")
结果:
Electro-Harmonix EHX Oceans 11 Eleven Reverb Hall Spring Guitar Effects Pedal: $119.98
Electro-Harmonix Oceans 11 Reverb - Used: $119.99
Electro-Harmonix Oceans 11 Multifunction Digital Reverb Effects Pedal: $122
Pre-Owned Electro-Harmonix Oceans 11 Reverb Multi Effects Pedal Used: $142.27
Electro-Harmonix Oceans 11 Reverb Matte Black: $110
这篇关于为什么BeautifulSoup在搜索结果网站上返回空列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!