MorningStar KeyStat转为 pandas 数据框 [英] MorningStar KeyStat to pandas Dataframe

查看:36
本文介绍了MorningStar KeyStat转为 pandas 数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在MorningStar中读取keyStat,并且知道在JSON中扭曲的HTML数据.到目前为止,我可以发出一个可以通过Beautifulsoup获取json的请求:

  url ='http://financials.morningstar.com/ajax/keystatsAjax.html?t=tou&culture=en-CA&region=CAN'lm_json =请求.get(URL).json()ksContent = BeautifulSoup(lm_json ["ksContent"],"html.parser") 

现在,这有点像ksContent一样包含了作为表格包含实际数据的html数据.我不是html的粉丝,并且想知道如何将其全部制作成一个不错的pandas数据框?由于表很长,下面是其中的一些内容:

 <表cellpadding ="0" cellspacing ="0" class ="r_table1 text2">< colgroup>< col width ="23%"/>< col span ="11" width ="7%"/></colgroup>< thead>< tr>< th align ="left" scope ="row"></th>< th align ="right" id ="Y0" scope ="col"> 2008-12</th>< th align ="right" id ="Y1" scope ="col"> 2009-12</th>< th align ="right" id ="Y2" scope ="col"> 2010-12</th>< th align ="right" id ="Y3" scope ="col"> 2011-12</th>< th align ="right" id ="Y4" scope ="col"> 2012-12</th>< th align ="right" id ="Y5" scope ="col"> 2013-12</th>< th align ="right" id ="Y6" scope ="col"> 2014-12</th>< th align ="right" id ="Y7" scope ="col"> 2015-12</th>< th align ="right" id ="Y8" scope ="col"> 2016-12</th>< th align ="right" id ="Y9" scope ="col"> 2017-12</th>< th align ="right" id ="Y10" scope ="col"> TTM</th></tr></thead>< tbody>< tr class ="hr">< td colspan ="12"></td></tr>< tr>< th class ="row_lbl" id ="i0" scope ="row">收入< span> CAD Mil</span></th>< td align ="right"标头="Y0 i0">-</td>< td align ="right"标头="Y1 i0"> 40</td>< td align ="right"标头="Y2 i0"> 212</td>< td align ="right"标头="Y3 i0"> 349</td>< td align ="right"标头="Y4 i0"> 442</td>< td align ="right"标头="Y5 i0"> 759</td>< td align ="right"标头="Y6 i0"> 1,379</td>< td align ="right"标头="Y7 i0"> 1,074</td>< td align ="right"标头="Y8 i0"> 1,125</td>< td align ="right"标头="Y9 i0"> 1,662</td>< td align ="right"标头="Y10 i0"> 1,760</td></tr>... 

它定义标头tr,Y0,Y1 ... Y10作为实际日期,下一个tr引用它.

感谢您的帮助!

解决方案

您可以使用

I am trying to read keyStat in MorningStar and know the data which is HTML where is warped in a JSON. So far I can put a request that can get the json by Beautifulsoup:

url = 'http://financials.morningstar.com/ajax/keystatsAjax.html?t=tou&culture=en-CA&region=CAN'
lm_json = requests.get(url).json()
ksContent = BeautifulSoup(lm_json["ksContent"],"html.parser")

Now here is a bit wired to me that the html data as 'ksContent' which contains actual data as a table. I am not a fan of html and wondering how can I just make all it to a nice pandas dataframe? As the table is long, here is some of it:

     <table cellpadding="0" cellspacing="0" class="r_table1 text2">
     <colgroup>
        <col width="23%"/>
        <col span="11" width="7%"/>
     </colgroup>
     <thead>
        <tr>
           <th align="left" scope="row"></th>
           <th align="right" id="Y0" scope="col">2008-12</th>
           <th align="right" id="Y1" scope="col">2009-12</th>
           <th align="right" id="Y2" scope="col">2010-12</th>
           <th align="right" id="Y3" scope="col">2011-12</th>
           <th align="right" id="Y4" scope="col">2012-12</th>
           <th align="right" id="Y5" scope="col">2013-12</th>
           <th align="right" id="Y6" scope="col">2014-12</th>
           <th align="right" id="Y7" scope="col">2015-12</th>
           <th align="right" id="Y8" scope="col">2016-12</th>
           <th align="right" id="Y9" scope="col">2017-12</th>
           <th align="right" id="Y10" scope="col">TTM</th>
        </tr>
     </thead>
     <tbody>
        <tr class="hr">
           <td colspan="12"></td>
        </tr>
        <tr>
           <th class="row_lbl" id="i0" scope="row">Revenue <span>CAD Mil</span></th>
           <td align="right" headers="Y0 i0">—</td>
           <td align="right" headers="Y1 i0">40</td>
           <td align="right" headers="Y2 i0">212</td>
           <td align="right" headers="Y3 i0">349</td>
           <td align="right" headers="Y4 i0">442</td>
           <td align="right" headers="Y5 i0">759</td>
           <td align="right" headers="Y6 i0">1,379</td>
           <td align="right" headers="Y7 i0">1,074</td>
           <td align="right" headers="Y8 i0">1,125</td>
           <td align="right" headers="Y9 i0">1,662</td>
           <td align="right" headers="Y10 i0">1,760</td>
        </tr> ...

It defines a header tr, Y0, Y1 ... Y10 as actual date and next tr refers to it.

your help appreciated!

解决方案

You can use read_html() to convert it into a list of dataframes

import requests
import pandas as pd
url = 'http://financials.morningstar.com/ajax/keystatsAjax.html?t=tou&culture=en-CA&region=CAN'
lm_json = requests.get(url).json()
df_list=pd.read_html(lm_json["ksContent"])

You can iterate through it and get the dataframes one by one. You can also use dropna() to get rid of the NaN only rows.

Sample output screenshot from my jupyter Notebook

这篇关于MorningStar KeyStat转为 pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆