beautifulsoup解析 - 处理上标? [英] beautifulsoup parsing - dealing with superscript?

查看:122
本文介绍了beautifulsoup解析 - 处理上标?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是HTML片段我试图提取信息:

This is the HTML segment I am trying to extract information from:

<td class="yfnc_tablehead1" width="74%">Market Cap (intraday)<font size="-1"><sup>5</sup></font>:</td><td class="yfnc_tabledata1"><span id="yfs_j10_aal">33.57B</span></td></tr>

在网页如下:

市值(盘中) 5 :33.57B

Market Cap (intraday)5:33.57B

我所(不工作):

    HTML_MarketCap = soup.find('sup', text='5').find_next_sibling('span').text

怎么能提取33.57B字符串?

How could I extract the 33.57B string?

推荐答案

跨度是不是兄弟姐妹,这是一个祖父母线的堂兄,一旦删除(感谢兄弟姐妹的孩子,1.618 )。

The span is not a sibling, it is a child of the sibling of the grandparent first cousin, once removed (thanks, 1.618).

from bs4 import BeautifulSoup as bs
soup = bs("""<td class="yfnc_tablehead1" width="74%">Market Cap (intraday)
<font size="-1"><sup>5</sup></font>:</td><td class="yfnc_tabledata1">
<span id="yfs_j10_aal">33.57B</span></td></tr>""")

soup.find("sup", text="5").parent.parent.find_next_sibling("td").find("span").text
# u'33.57B'

既然你似乎有它的问题,这里的(使用蟒蛇,请求我的全部测试脚本),可靠地工作对我来说:

Since you seem to have problems with it, here's my full test script (using python-requests), that reliably works for me:

import requests
from bs4 import BeautifulSoup as bs

url = "https://finance.yahoo.com/q/ks?s=AAL+Key+Statistics"

r = requests.get(url)

soup = bs(r.text)

HTML_MarketCap = soup.find("sup", text="5").parent.parent.find_next_sibling("td").find("span").text

print HTML_MarketCap

这篇关于beautifulsoup解析 - 处理上标?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆