在 pandas 数据框中展平嵌套的Json [英] flattening nested Json in pandas data frame
问题描述
我正在尝试将json文件加载到熊猫数据框. 我发现有一些嵌套的json. 下面是示例json:
I am trying to load the json file to pandas data frame. I found that there were some nested json. Below is the sample json:
{'events': [{'id': 142896214,
'playerId': 37831,
'teamId': 3157,
'matchId': 2214569,
'matchPeriod': '1H',
'eventSec': 0.8935539999999946,
'eventId': 8,
'eventName': 'Pass',
'subEventId': 85,
'subEventName': 'Simple pass',
'positions': [{'x': 51, 'y': 49}, {'x': 40, 'y': 53}],
'tags': [{'id': 1801, 'tag': {'label': 'accurate'}}]}
我使用以下代码将json加载到数据帧中:
I used the following code to load json into dataframe:
with open('EVENTS.json') as f:
jsonstr = json.load(f)
df = pd.io.json.json_normalize(jsonstr['events'])
下面是df.head()的输出
Below is the output of df.head()
但是我发现了两个嵌套的列,例如位置和标签.
But I found two nested columns such as positions and tags.
我尝试使用以下代码对其进行展平:
I tried using the following code to flatten it:
Position_data = json_normalize(data =jsonstr['events'], record_path='positions', meta = ['x','y','x','y'] )
它向我显示了一个错误,如下所示:
It showed me an error as follow:
KeyError: "Try running with errors='ignore' as key 'x' is not always present"
您能建议我如何展平位置和标签(那些具有嵌套数据的位置和标签)吗?
Can you advise me how to flatten positions and tags ( those having nested data).
谢谢, 压缩
推荐答案
如果您正在寻找一种更通用的方法来从json展开多个层次结构,则可以使用recursion
并列出理解来重塑数据.下面介绍了一种替代方法:
If you are looking for a more general way to unfold multiple hierarchies from a json you can use recursion
and list comprehension to reshape your data. One alternative is presented below:
def flatten_json(nested_json, exclude=['']):
"""Flatten json object with nested keys into a single level.
Args:
nested_json: A nested json object.
exclude: Keys to exclude from output.
Returns:
The flattened json object if successful, None otherwise.
"""
out = {}
def flatten(x, name='', exclude=exclude):
if type(x) is dict:
for a in x:
if a not in exclude: flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(nested_json)
return out
然后,您可以应用于数据,而与嵌套级别无关:
Then you can apply to your data, independent of nested levels:
新示例数据
this_dict = {'events': [
{'id': 142896214,
'playerId': 37831,
'teamId': 3157,
'matchId': 2214569,
'matchPeriod': '1H',
'eventSec': 0.8935539999999946,
'eventId': 8,
'eventName': 'Pass',
'subEventId': 85,
'subEventName': 'Simple pass',
'positions': [{'x': 51, 'y': 49}, {'x': 40, 'y': 53}],
'tags': [{'id': 1801, 'tag': {'label': 'accurate'}}]},
{'id': 142896214,
'playerId': 37831,
'teamId': 3157,
'matchId': 2214569,
'matchPeriod': '1H',
'eventSec': 0.8935539999999946,
'eventId': 8,
'eventName': 'Pass',
'subEventId': 85,
'subEventName': 'Simple pass',
'positions': [{'x': 51, 'y': 49}, {'x': 40, 'y': 53},{'x': 51, 'y': 49}],
'tags': [{'id': 1801, 'tag': {'label': 'accurate'}}]}
]}
用法
pd.DataFrame([flatten_json(x) for x in this_dict['events']])
Out[1]:
id playerId teamId matchId matchPeriod eventSec eventId \
0 142896214 37831 3157 2214569 1H 0.893554 8
1 142896214 37831 3157 2214569 1H 0.893554 8
eventName subEventId subEventName positions_0_x positions_0_y \
0 Pass 85 Simple pass 51 49
1 Pass 85 Simple pass 51 49
positions_1_x positions_1_y tags_0_id tags_0_tag_label positions_2_x \
0 40 53 1801 accurate NaN
1 40 53 1801 accurate 51.0
positions_2_y
0 NaN
1 49.0
请注意,此flatten_json
代码不是我的,我已经看到了它 和
Note that this flatten_json
code is not mine, I have seen it here and here without much certainty of the original source.
这篇关于在 pandas 数据框中展平嵌套的Json的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!