从在BeautifulSoup中包含嵌套span标签的span标签中抓取文本 [英] scrape text from a span tag containing nested span tag in BeautifulSoup
问题描述
我在Google上进行了很多搜索,但无法找到解决此问题的理想代码行.
I searched a lot on google but was not able to get a perfect code line for this problem.
如何使用Python的BeautifulSoup库从给定的HTML代码中提取55,000.00.
How to extract 55,000.00 from the given HTML code, using Python's BeautifulSoup Library.
<span style="text-decoration: inherit; white-space: nowrap;">
<span class="currencyINR">
</span>
<span class="currencyINRFallback" style="display:none">
Rs.
</span>
35,916.00
</span>
The above HTML code is a part of following link - https://www.amazon.in/gp/offer-listing/B01671J2I6/ref=dp_olp_afts?ie=UTF8&condition=all&qid=1602348797&sr=1-19
我尝试了以下代码:
import requests
from bs4 import BeautifulSoup
URL = "https://www.amazon.in/gp/offer-listing/B01671J2I6/ref=dp_olp_afts?
ie=UTF8&condition=all&qid=1602348797&sr=1-19"
HEADER = {'User-Agent' : "Mozilla/5.0 (Windows NT 10.0; Win64; x64)
ppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.6"}
page = requests.get(URL, headers=HEADER)
soup = BeautifulSoup(page.content, "html.parser")
price = soup.find("span", {"style" : "text-decoration: inherit; white-space:
nowrap;"}).getText()
print(price)
它给了我
AttributeError: 'NoneType' object has no attribute 'getText'
推荐答案
对于您的问题中给出的网址,您将如何获得价格:
For the url given in your question here's how you would get the price:
import requests
from bs4 import BeautifulSoup
URL = "https://www.amazon.in/gp/offer-listing/B01671J2I6/ref=dp_olp_afts?ie=UTF8&condition=all&qid=1602348797&sr=1-19/"
HEADER = {
'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.6",
}
page = requests.get(URL, headers=HEADER)
soup = BeautifulSoup(page.content, "html5lib")
price_spans = soup.find_all("span", {"style": "text-decoration: inherit; white-space: nowrap;"})
print([p.getText(strip=True) for p in price_spans])
输出: ['Rs.35,916.00','Rs.35,916.00','Rs.45,000.00']
注意:我已经更改了 HTML
解析器,因此您可能必须先执行此 pip install html5lib
Note: I've changed the HTML
parser, so you might have to do this first pip install html5lib
这篇关于从在BeautifulSoup中包含嵌套span标签的span标签中抓取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!