为什么当美丽的汤解析是这个标签是空的? [英] Why is this tag empty when parsed with beautiful soup?
问题描述
我解析这个页面与美丽的汤:
I am parsing this page with beautiful soup:
https://au.finance.yahoo.com/q/is? S = AAPL
我试图得到27/09/2014(42123000)的总收入是在靠近顶部的语句中的第一个值。
I am attempting to get the total revenue for 27/09/2014 (42,123,000) which is one of the first values on the statement near the top.
我考察了镀铬工具的元素,并发现该值是一个表类名 yfnc_tabledata1
。
I inspected the element in chrome tools and found that the value is in a table with class name yfnc_tabledata1
.
我的Python code是如下:
My python code is as follows:
import requests
import bs4
#get webpage
page = requests.get("https://au.finance.yahoo.com/q/is?s=AAPL")
#put into beautiful soup
soup = bs4.BeautifulSoup(page.content)
#select tag
tag = soup.select("table.yfnc_tabledata1")
到目前为止好,这抓住具有所需的数据表,但是这是我在哪里卡住了。
So far so good, this grabs the table that has the needed data but this is where I am stuck.
这导致了我想要的数据链如下:
The chain that leads to the data I want is as follows:
标记> TBODY> TR> TD>表> TBODY> (然后第二个TR)
但是,当我尝试使用此我得到一个空元素。
But when I try to use this I get an empty element.
任何人可以帮助我呢?
还有奖励积分谁能告诉我怎样才能学会在更普遍的意义来提取这样的数据?我经常需要提取HTML文档中深埋数据,并可以似乎从来没有制定出正确的code来获得我想要的数据。
Also for bonus points can anyone tell me how I can learn to extract data like this in a more general sense? I constantly need to extract data buried deep within an HTML document and can never seem to work out the correct code to get to the data I want.
非常感谢任何帮助AP preciated。
Thanks a lot any help appreciated.
推荐答案
让我们的具体和实际的
我们的想法是找到总收入
标签和使用的 .next_sibling
:
The idea is to find the Total Revenue
label and get the next cell's text using .next_sibling
:
table = soup.find("table", class_="yfnc_tabledata1")
total_revenue_label = table.find(text=re.compile(r'Total Revenue'))
print total_revenue_label.parent.parent.next_sibling.get_text(strip=True)
演示:
>>> import re
>>> import requests
>>> import bs4
>>>
>>> page = requests.get("https://au.finance.yahoo.com/q/is?s=AAPL")
>>> soup = bs4.BeautifulSoup(page.content)
>>>
>>> table = soup.find("table", class_="yfnc_tabledata1")
>>> total_revenue_label = table.find(text=re.compile(r'Total Revenue'))
>>> total_revenue_label.parent.parent.next_sibling.get_text(strip=True)
42,123,000
这篇关于为什么当美丽的汤解析是这个标签是空的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!