从在BeautifulSoup中包含嵌套span标签的span标签中抓取文本 [英] scrape text from a span tag containing nested span tag in BeautifulSoup

查看:120
本文介绍了从在BeautifulSoup中包含嵌套span标签的span标签中抓取文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Google上进行了很多搜索,但无法找到解决此问题的理想代码行.

I searched a lot on google but was not able to get a perfect code line for this problem.

如何使用Python的BeautifulSoup库从给定的HTML代码中提取55,000.00.

How to extract 55,000.00 from the given HTML code, using Python's BeautifulSoup Library.

<span style="text-decoration: inherit; white-space: nowrap;">
<span class="currencyINR">
&nbsp;&nbsp;
</span>
<span class="currencyINRFallback" style="display:none">
Rs. 
</span>
35,916.00
</span>

以上HTML代码是以下链接的一部分-

The above HTML code is a part of following link - https://www.amazon.in/gp/offer-listing/B01671J2I6/ref=dp_olp_afts?ie=UTF8&condition=all&qid=1602348797&sr=1-19

我尝试了以下代码:

import requests
from bs4 import BeautifulSoup

URL = "https://www.amazon.in/gp/offer-listing/B01671J2I6/ref=dp_olp_afts? 
ie=UTF8&condition=all&qid=1602348797&sr=1-19"

HEADER = {'User-Agent' : "Mozilla/5.0 (Windows NT 10.0; Win64; x64) 
ppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.6"}

page = requests.get(URL, headers=HEADER)
soup = BeautifulSoup(page.content, "html.parser")
price = soup.find("span", {"style" : "text-decoration: inherit; white-space: 
nowrap;"}).getText()
print(price)

它给了我

AttributeError: 'NoneType' object has no attribute 'getText'

推荐答案

对于您的问题中给出的网址,您将如何获得价格:

For the url given in your question here's how you would get the price:

import requests
from bs4 import BeautifulSoup

URL = "https://www.amazon.in/gp/offer-listing/B01671J2I6/ref=dp_olp_afts?ie=UTF8&condition=all&qid=1602348797&sr=1-19/"

HEADER = {
    'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.6",
}

page = requests.get(URL, headers=HEADER)
soup = BeautifulSoup(page.content, "html5lib")
price_spans = soup.find_all("span", {"style": "text-decoration: inherit; white-space: nowrap;"})
print([p.getText(strip=True) for p in price_spans])

输出: ['Rs.35,916.00','Rs.35,916.00','Rs.45,000.00']

注意:我已经更改了 HTML 解析器,因此您可能必须先执行此 pip install html5lib

Note: I've changed the HTML parser, so you might have to do this first pip install html5lib

这篇关于从在BeautifulSoup中包含嵌套span标签的span标签中抓取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆