如何通过Python Pandas正确规范化json [英] How to normalize json correctly by Python Pandas
问题描述
我是Python的初学者.我想要做的是通过Pandas加载外汇历史价格数据的json文件,并对数据进行统计.我已经遍历了有关Pandas和解析json文件的许多主题. 我想将具有额外值和嵌套列表的json文件传递给pandas数据框.我在这里遇到了问题.
I am a beginner in Python. What I want to do is load a json file of forex historical price data by Pandas and do statistic with the data. I have go through many topics on Pandas and parsing json file. I want to pass a json file with extra value and nested list to a pandas data frame. I got a problem stuck here.
我有一个json文件'EUR_JPY_H8.json'
I got a json file 'EUR_JPY_H8.json'
首先,我导入所需的库,
First I import the lib that required,
import pandas as pd
import json
from pandas.io.json import json_normalize
然后加载json文件,
Then load the json file,
with open('EUR_JPY_H8.json') as data_file:
data = json.load(data_file)
我在下面列出了一个列表:
I got a list below:
[{u'complete': True,
u'mid': {u'c': u'119.743',
u'h': u'119.891',
u'l': u'119.249',
u'o': u'119.341'},
u'time': u'1488319200.000000000',
u'volume': 14651},
{u'complete': True,
u'mid': {u'c': u'119.893',
u'h': u'119.954',
u'l': u'119.552',
u'o': u'119.738'},
u'time': u'1488348000.000000000',
u'volume': 10738},
{u'complete': True,
u'mid': {u'c': u'119.946',
u'h': u'120.221',
u'l': u'119.840',
u'o': u'119.888'},
u'time': u'1488376800.000000000',
u'volume': 10041}]
然后我将列表传递给json_normalize. 尝试获取嵌套在"mid"下方的嵌套列表中的价格
Then I pass the list to json_normalize. Try to get price which is in the nested list under 'mid'
result = json_normalize(data,'time',['time','volume','complete',['mid','h'],['mid','l'],['mid','c'],['mid','o']])
但是我得到了这样的结果, json_normalize输出
But I got such result, json_normalize output
时间"数据逐行细分为每个整数. 我已经检查了相关文件.我必须将字符串或列表对象传递给json_normalize的第二个参数.如何在不传递时间戳的情况下传递时间戳.
The 'time' data got breakdown into each integer row by row. I have checked related document. I have to pass a string or list object to the 2nd parameter of json_normalize. How can I pass the timestamp there without breaking down.
我的预期输出是:
column =
index | time | volumn | completed | mid.h | mid.l | mid.c | mid.o
推荐答案
您只需传递data
而无需任何其他参数.
You could just pass data
without any extra params.
df = pd.io.json.json_normalize(data)
df
complete mid.c mid.h mid.l mid.o time volume
0 True 119.743 119.891 119.249 119.341 1488319200.000000000 14651
1 True 119.893 119.954 119.552 119.738 1488348000.000000000 10738
2 True 119.946 120.221 119.840 119.888 1488376800.000000000 10041
如果要更改列顺序,请使用df.reindex
:
df = df.reindex(columns=['time', 'volume', 'complete', 'mid.h', 'mid.l', 'mid.c', 'mid.o'])
df
time volume complete mid.h mid.l mid.c mid.o
0 1488319200.000000000 14651 True 119.891 119.249 119.743 119.341
1 1488348000.000000000 10738 True 119.954 119.552 119.893 119.738
2 1488376800.000000000 10041 True 120.221 119.840 119.946 119.888
这篇关于如何通过Python Pandas正确规范化json的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!