如何通过 Python Pandas 正确规范化 json [英] How to normalize json correctly by Python Pandas

查看:27
本文介绍了如何通过 Python Pandas 正确规范化 json的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 Python 初学者.我想要做的是通过 Pandas 加载外汇历史价格数据的 json 文件并对数据进行统计.我已经浏览了许多关于 Pandas 和解析 json 文件的主题.我想将带有额外值和嵌套列表的 json 文件传递​​给 Pandas 数据框.我有一个问题卡在这里.

I am a beginner in Python. What I want to do is load a json file of forex historical price data by Pandas and do statistic with the data. I have go through many topics on Pandas and parsing json file. I want to pass a json file with extra value and nested list to a pandas data frame. I got a problem stuck here.

我有一个 json 文件 'EUR_JPY_H8.json'

I got a json file 'EUR_JPY_H8.json'

首先我导入所需的库,

import pandas as pd
import json
from pandas.io.json import json_normalize

然后加载json文件,

Then load the json file,

with open('EUR_JPY_H8.json') as data_file:    
data = json.load(data_file)

我得到了以下列表:

[{u'complete': True,
u'mid': {u'c': u'119.743',
  u'h': u'119.891',
  u'l': u'119.249',
  u'o': u'119.341'},
u'time': u'1488319200.000000000',
u'volume': 14651},
{u'complete': True,
u'mid': {u'c': u'119.893',
  u'h': u'119.954',
  u'l': u'119.552',
  u'o': u'119.738'},
u'time': u'1488348000.000000000',
u'volume': 10738},
{u'complete': True,
u'mid': {u'c': u'119.946',
  u'h': u'120.221',
  u'l': u'119.840',
  u'o': u'119.888'},
u'time': u'1488376800.000000000',
u'volume': 10041}]

然后我将列表传递给 json_normalize.尝试获取mid"下嵌套列表中的价格

Then I pass the list to json_normalize. Try to get price which is in the nested list under 'mid'

result = json_normalize(data,'time',['time','volume','complete',['mid','h'],['mid','l'],['mid','c'],['mid','o']])

但是我得到了这样的结果,json_normalize 输出

But I got such result, json_normalize output

时间"数据逐行细分为每个整数.我检查了相关文件.我必须将字符串或列表对象传递给 json_normalize 的第二个参数.如何在不中断的情况下传递时间戳.

The 'time' data got breakdown into each integer row by row. I have checked related document. I have to pass a string or list object to the 2nd parameter of json_normalize. How can I pass the timestamp there without breaking down.

我的预期输出是:

column = 
  index  |  time  | volumn  |  completed  |  mid.h  |  mid.l  |  mid.c  |  mid.o 

推荐答案

你可以只传递 data 而不需要任何额外的参数.

You could just pass data without any extra params.

df = pd.io.json.json_normalize(data)
df

   complete    mid.c    mid.h    mid.l    mid.o                  time  volume
0      True  119.743  119.891  119.249  119.341  1488319200.000000000   14651
1      True  119.893  119.954  119.552  119.738  1488348000.000000000   10738
2      True  119.946  120.221  119.840  119.888  1488376800.000000000   10041

<小时>

如果你想改变列顺序,使用df.reindex:

df = df.reindex(columns=['time', 'volume', 'complete', 'mid.h', 'mid.l', 'mid.c', 'mid.o'])
df

                   time  volume  complete    mid.h    mid.l    mid.c    mid.o
0  1488319200.000000000   14651      True  119.891  119.249  119.743  119.341
1  1488348000.000000000   10738      True  119.954  119.552  119.893  119.738
2  1488376800.000000000   10041      True  120.221  119.840  119.946  119.888

这篇关于如何通过 Python Pandas 正确规范化 json的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆