MorningStar KeyStat转为 pandas 数据框 [英] MorningStar KeyStat to pandas Dataframe
问题描述
我正在尝试在MorningStar中读取keyStat,并且知道在JSON中扭曲的HTML数据.到目前为止,我可以发出一个可以通过Beautifulsoup获取json的请求:
url ='http://financials.morningstar.com/ajax/keystatsAjax.html?t=tou&culture=en-CA®ion=CAN'lm_json =请求.get(URL).json()ksContent = BeautifulSoup(lm_json ["ksContent"],"html.parser")
现在,这有点像ksContent一样包含了作为表格包含实际数据的html数据.我不是html的粉丝,并且想知道如何将其全部制作成一个不错的pandas数据框?由于表很长,下面是其中的一些内容:
<表cellpadding ="0" cellspacing ="0" class ="r_table1 text2">< colgroup>< col width ="23%"/>< col span ="11" width ="7%"/></colgroup>< thead>< tr>< th align ="left" scope ="row"></th>< th align ="right" id ="Y0" scope ="col"> 2008-12</th>< th align ="right" id ="Y1" scope ="col"> 2009-12</th>< th align ="right" id ="Y2" scope ="col"> 2010-12</th>< th align ="right" id ="Y3" scope ="col"> 2011-12</th>< th align ="right" id ="Y4" scope ="col"> 2012-12</th>< th align ="right" id ="Y5" scope ="col"> 2013-12</th>< th align ="right" id ="Y6" scope ="col"> 2014-12</th>< th align ="right" id ="Y7" scope ="col"> 2015-12</th>< th align ="right" id ="Y8" scope ="col"> 2016-12</th>< th align ="right" id ="Y9" scope ="col"> 2017-12</th>< th align ="right" id ="Y10" scope ="col"> TTM</th></tr></thead>< tbody>< tr class ="hr">< td colspan ="12"></td></tr>< tr>< th class ="row_lbl" id ="i0" scope ="row">收入< span> CAD Mil</span></th>< td align ="right"标头="Y0 i0">-</td>< td align ="right"标头="Y1 i0"> 40</td>< td align ="right"标头="Y2 i0"> 212</td>< td align ="right"标头="Y3 i0"> 349</td>< td align ="right"标头="Y4 i0"> 442</td>< td align ="right"标头="Y5 i0"> 759</td>< td align ="right"标头="Y6 i0"> 1,379</td>< td align ="right"标头="Y7 i0"> 1,074</td>< td align ="right"标头="Y8 i0"> 1,125</td>< td align ="right"标头="Y9 i0"> 1,662</td>< td align ="right"标头="Y10 i0"> 1,760</td></tr>...
它定义标头tr,Y0,Y1 ... Y10作为实际日期,下一个tr引用它.
感谢您的帮助!
您可以使用
I am trying to read keyStat in MorningStar and know the data which is HTML where is warped in a JSON. So far I can put a request that can get the json by Beautifulsoup:
url = 'http://financials.morningstar.com/ajax/keystatsAjax.html?t=tou&culture=en-CA®ion=CAN'
lm_json = requests.get(url).json()
ksContent = BeautifulSoup(lm_json["ksContent"],"html.parser")
Now here is a bit wired to me that the html data as 'ksContent' which contains actual data as a table. I am not a fan of html and wondering how can I just make all it to a nice pandas dataframe? As the table is long, here is some of it:
<table cellpadding="0" cellspacing="0" class="r_table1 text2">
<colgroup>
<col width="23%"/>
<col span="11" width="7%"/>
</colgroup>
<thead>
<tr>
<th align="left" scope="row"></th>
<th align="right" id="Y0" scope="col">2008-12</th>
<th align="right" id="Y1" scope="col">2009-12</th>
<th align="right" id="Y2" scope="col">2010-12</th>
<th align="right" id="Y3" scope="col">2011-12</th>
<th align="right" id="Y4" scope="col">2012-12</th>
<th align="right" id="Y5" scope="col">2013-12</th>
<th align="right" id="Y6" scope="col">2014-12</th>
<th align="right" id="Y7" scope="col">2015-12</th>
<th align="right" id="Y8" scope="col">2016-12</th>
<th align="right" id="Y9" scope="col">2017-12</th>
<th align="right" id="Y10" scope="col">TTM</th>
</tr>
</thead>
<tbody>
<tr class="hr">
<td colspan="12"></td>
</tr>
<tr>
<th class="row_lbl" id="i0" scope="row">Revenue <span>CAD Mil</span></th>
<td align="right" headers="Y0 i0">—</td>
<td align="right" headers="Y1 i0">40</td>
<td align="right" headers="Y2 i0">212</td>
<td align="right" headers="Y3 i0">349</td>
<td align="right" headers="Y4 i0">442</td>
<td align="right" headers="Y5 i0">759</td>
<td align="right" headers="Y6 i0">1,379</td>
<td align="right" headers="Y7 i0">1,074</td>
<td align="right" headers="Y8 i0">1,125</td>
<td align="right" headers="Y9 i0">1,662</td>
<td align="right" headers="Y10 i0">1,760</td>
</tr> ...
It defines a header tr, Y0, Y1 ... Y10 as actual date and next tr refers to it.
your help appreciated!
You can use read_html() to convert it into a list of dataframes
import requests
import pandas as pd
url = 'http://financials.morningstar.com/ajax/keystatsAjax.html?t=tou&culture=en-CA®ion=CAN'
lm_json = requests.get(url).json()
df_list=pd.read_html(lm_json["ksContent"])
You can iterate through it and get the dataframes one by one. You can also use dropna() to get rid of the NaN only rows.
Sample output screenshot from my jupyter Notebook
这篇关于MorningStar KeyStat转为 pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!