json到DataFrame的转换问题 [英] json to DataFrame conversion issues

查看:93
本文介绍了json到DataFrame的转换问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

{'endDate': '2017-12-31',
 'results': [{'data': [{'period': '2017-01-01', 'ratio': 26.91301},
    {'period': '2017-02-01', 'ratio': 19.77063},
    {'period': '2017-03-01', 'ratio': 20.40775},
    {'period': '2017-04-01', 'ratio': 16.02843},
    {'period': '2017-05-01', 'ratio': 10.38159},
    {'period': '2017-06-01', 'ratio': 8.2087},
    {'period': '2017-07-01', 'ratio': 8.67815},
    {'period': '2017-08-01', 'ratio': 19.58956},
    {'period': '2017-09-01', 'ratio': 36.94587},
    {'period': '2017-10-01', 'ratio': 36.28194},
    {'period': '2017-11-01', 'ratio': 16.64543},
    {'period': '2017-12-01', 'ratio': 1.67661}],
   'keywords': ['Data_spec'],
   'title': 'Data_spec'},
  {'data': [{'period': '2017-01-01', 'ratio': 17.65139},
    {'period': '2017-02-01', 'ratio': 14.52618},
    {'period': '2017-03-01', 'ratio': 15.00234},
    {'period': '2017-04-01', 'ratio': 12.1521},
    {'period': '2017-05-01', 'ratio': 10.63644},
    {'period': '2017-06-01', 'ratio': 8.59767},
    {'period': '2017-07-01', 'ratio': 8.95312},
    {'period': '2017-08-01', 'ratio': 13.05747},
    {'period': '2017-09-01', 'ratio': 48.00482},
    {'period': '2017-10-01', 'ratio': 23.7811},
    {'period': '2017-11-01', 'ratio': 16.90027},
    {'period': '2017-12-01', 'ratio': 0.89866}],
   'keywords': ['Data_rate'],
   'title': 'Date_rate'},
  {'data': [], 'keywords': ['Data_over'], 'title': 'Data_over'},
  {'data': [{'period': '2017-01-01', 'ratio': 79.17644},
    {'period': '2017-02-01', 'ratio': 84.01851},
    {'period': '2017-03-01', 'ratio': 100.0},
    {'period': '2017-04-01', 'ratio': 91.19442},
    {'period': '2017-05-01', 'ratio': 93.21976},
    {'period': '2017-06-01', 'ratio': 93.42096},
    {'period': '2017-07-01', 'ratio': 89.14895},
    {'period': '2017-08-01', 'ratio': 91.85165},
    {'period': '2017-09-01', 'ratio': 91.24136},
    {'period': '2017-10-01', 'ratio': 90.35611},
    {'period': '2017-11-01', 'ratio': 81.88585},
    {'period': '2017-12-01', 'ratio': 7.49111}],
   'keywords': ['Data_under'],
   'title': 'Data_under'},
  {'data': [{'period': '2017-01-01', 'ratio': 0.70417},
    {'period': '2017-02-01', 'ratio': 1.11997},
    {'period': '2017-03-01', 'ratio': 1.81074},
    {'period': '2017-04-01', 'ratio': 1.38823},
    {'period': '2017-05-01', 'ratio': 0.97914},
    {'period': '2017-06-01', 'ratio': 1.14009},
    {'period': '2017-07-01', 'ratio': 0.78465},
    {'period': '2017-08-01', 'ratio': 1.07973},
    {'period': '2017-09-01', 'ratio': 0.94561},
    {'period': '2017-10-01', 'ratio': 0.85172},
    {'period': '2017-11-01', 'ratio': 1.27422},
    {'period': '2017-12-01', 'ratio': 0.08718}],
   'keywords': ['Data_tune'],
   'title': 'Data_tune'}],
 'startDate': '2017-01-01',
 'timeUnit': 'month'}

上面是'my_dict = json.loads(js)',其中js是字符串.我正在尝试将这些数据放入Pandas DataFrame.我在下面使用了代码.

Above is 'my_dict = json.loads(js)' where js is string. I am trying to put these data to Pandas DataFrame. I used code below.

lst = [pd.DataFrame.from_dict(r['data']).set_index('period').rename(columns={'ratio' : r['title']})
           for r in d['results']]
df = pd.concat(lst, 1)

在此js集具有空值之前,我的代码运行良好. 一个问题是您已经注意到'keyworks':['Data_over']具有空的'data'.所以我不能将index设置为'period'.我仍然希望在熊猫DF中使用"Data_over",但为空.是否可以将DataFrame的列名设置为"Data_over",但将其设置为空?因此我的最终代码可以将JSON转换为带有或不带有数据"的df.

My code worked perfectly until this js set have empty value. One problem is that you have noticed the 'keyworks' : ['Data_over'] has empty 'data'. So I cannot set index as 'period'. I still want the 'Data_over' in my pandas DF but empty. Is it possible to set the DataFrame with 'Data_over' as column name but empty on the value? so my final code can convert json to df with or with out 'Data'.

下面是我想要的输出. (实际值有所不同,但您有这个概念)

Below is the output that I want. (actual values are different, but you got the concept)

    Data_spec    Data_rate   Data_over   Data_under   Data_tune
2017-01-01  0.55116     NaN     NaN         7.12056     2.25329
2017-02-01  0.32016     0.08915     NaN         6.43161     1.19959
2017-03-01  0.32421     0.10131     NaN         6.48024     1.30091
2017-04-01  0.33232     0.01215     NaN         6.05471     1.26038
2017-05-01  0.39311     0.12968     NaN         6.19655     1.21985
2017-06-01  0.47011     0.03647     NaN         5.71023     1.03748
2017-07-01  4.32016     NaN     NaN         11.85005    0.84295
2017-08-01  8.81053     0.04052     NaN         51.44072    0.89564
2017-09-01  14.46808    0.02836     NaN         100.00000   0.85511
2017-10-01  4.27152     0.10942     NaN         34.65451    0.87132
2017-11-01  0.29989     0.05673     NaN         13.02127    0.77811
2017-12-01  0.00810     0.06079     NaN         0.80243     NaN

推荐答案

可以通过len(r['data']) > 0检查length,并在list comprehension中过滤出空的DataFrame:

There is possible check length by len(r['data']) > 0 and filter out empty DataFrames in list comprehension:

lst = [pd.DataFrame(r['data']).set_index('period').rename(columns={'ratio' : r['title']})
           for r in d['results'] if len(r['data']) > 0]
df = pd.concat(lst, 1)
print (df)

            Data_spec  Date_rate  Data_under  Data_tune
period                                                 
2017-01-01   26.91301   17.65139    79.17644    0.70417
2017-02-01   19.77063   14.52618    84.01851    1.11997
2017-03-01   20.40775   15.00234   100.00000    1.81074
2017-04-01   16.02843   12.15210    91.19442    1.38823
2017-05-01   10.38159   10.63644    93.21976    0.97914
2017-06-01    8.20870    8.59767    93.42096    1.14009
2017-07-01    8.67815    8.95312    89.14895    0.78465
2017-08-01   19.58956   13.05747    91.85165    1.07973
2017-09-01   36.94587   48.00482    91.24136    0.94561
2017-10-01   36.28194   23.78110    90.35611    0.85172
2017-11-01   16.64543   16.90027    81.88585    1.27422
2017-12-01    1.67661    0.89866     7.49111    0.08718

如果r['data']为空,则可以创建自定义DataFrame,因为使用了对齐索引d['startDate']:

There si possible create custom DataFrame if r['data'] are empty, for align index is used d['startDate']:

lst = [pd.DataFrame(r['data']).set_index('period').rename(columns={'ratio' : r['title']}) 
        if len(r['data']) > 0 
        else pd.DataFrame([np.nan], columns=[r['title']], index=[d['startDate']])
        for r in d['results'] ]
df = pd.concat(lst, 1)
print (df)
            Data_spec  Date_rate  Data_over  Data_under  Data_tune
2017-01-01   26.91301   17.65139        NaN    79.17644    0.70417
2017-02-01   19.77063   14.52618        NaN    84.01851    1.11997
2017-03-01   20.40775   15.00234        NaN   100.00000    1.81074
2017-04-01   16.02843   12.15210        NaN    91.19442    1.38823
2017-05-01   10.38159   10.63644        NaN    93.21976    0.97914
2017-06-01    8.20870    8.59767        NaN    93.42096    1.14009
2017-07-01    8.67815    8.95312        NaN    89.14895    0.78465
2017-08-01   19.58956   13.05747        NaN    91.85165    1.07973
2017-09-01   36.94587   48.00482        NaN    91.24136    0.94561
2017-10-01   36.28194   23.78110        NaN    90.35611    0.85172
2017-11-01   16.64543   16.90027        NaN    81.88585    1.27422
2017-12-01    1.67661    0.89866        NaN     7.49111    0.08718

这篇关于json到DataFrame的转换问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆