beautifulsoup解析 - 处理上标? [英] beautifulsoup parsing - dealing with superscript?
本文介绍了beautifulsoup解析 - 处理上标?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
这是HTML片段我试图提取信息:
This is the HTML segment I am trying to extract information from:
<td class="yfnc_tablehead1" width="74%">Market Cap (intraday)<font size="-1"><sup>5</sup></font>:</td><td class="yfnc_tabledata1"><span id="yfs_j10_aal">33.57B</span></td></tr>
在网页如下:
市值(盘中) 5 :33.57B
Market Cap (intraday)5:33.57B
我所(不工作):
HTML_MarketCap = soup.find('sup', text='5').find_next_sibling('span').text
我
怎么能提取33.57B字符串?
How could I extract the 33.57B string?
推荐答案
跨度是不是兄弟姐妹,这是一个祖父母线的堂兄,一旦删除(感谢兄弟姐妹的孩子,1.618 )。
The span is not a sibling, it is a child of the sibling of the grandparent first cousin, once removed (thanks, 1.618).
from bs4 import BeautifulSoup as bs
soup = bs("""<td class="yfnc_tablehead1" width="74%">Market Cap (intraday)
<font size="-1"><sup>5</sup></font>:</td><td class="yfnc_tabledata1">
<span id="yfs_j10_aal">33.57B</span></td></tr>""")
soup.find("sup", text="5").parent.parent.find_next_sibling("td").find("span").text
# u'33.57B'
既然你似乎有它的问题,这里的(使用蟒蛇,请求我的全部测试脚本),可靠地工作对我来说:
Since you seem to have problems with it, here's my full test script (using python-requests), that reliably works for me:
import requests
from bs4 import BeautifulSoup as bs
url = "https://finance.yahoo.com/q/ks?s=AAL+Key+Statistics"
r = requests.get(url)
soup = bs(r.text)
HTML_MarketCap = soup.find("sup", text="5").parent.parent.find_next_sibling("td").find("span").text
print HTML_MarketCap
这篇关于beautifulsoup解析 - 处理上标?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文