在 pandas 数据框中展平嵌套的Json [英] flattening nested Json in pandas data frame

查看:105
本文介绍了在 pandas 数据框中展平嵌套的Json的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将json文件加载到熊猫数据框. 我发现有一些嵌套的json. 下面是示例json:

I am trying to load the json file to pandas data frame. I found that there were some nested json. Below is the sample json:

{'events': [{'id': 142896214,
   'playerId': 37831,
   'teamId': 3157,
   'matchId': 2214569,
   'matchPeriod': '1H',
   'eventSec': 0.8935539999999946,
   'eventId': 8,
   'eventName': 'Pass',
   'subEventId': 85,
   'subEventName': 'Simple pass',
   'positions': [{'x': 51, 'y': 49}, {'x': 40, 'y': 53}],
   'tags': [{'id': 1801, 'tag': {'label': 'accurate'}}]}

我使用以下代码将json加载到数据帧中:

I used the following code to load json into dataframe:

with open('EVENTS.json') as f:
    jsonstr = json.load(f)

df = pd.io.json.json_normalize(jsonstr['events'])

下面是df.head()的输出

Below is the output of df.head()

但是我发现了两个嵌套的列,例如位置和标签.

But I found two nested columns such as positions and tags.

我尝试使用以下代码对其进行展平:

I tried using the following code to flatten it:

Position_data = json_normalize(data =jsonstr['events'], record_path='positions', meta = ['x','y','x','y'] )

它向我显示了一个错误,如下所示:

It showed me an error as follow:

KeyError: "Try running with errors='ignore' as key 'x' is not always present"

您能建议我如何展平位置和标签(那些具有嵌套数据的位置和标签)吗?

Can you advise me how to flatten positions and tags ( those having nested data).

谢谢, 压缩

推荐答案

如果您正在寻找一种更通用的方法来从json展开多个层次结构,则可以使用recursion并列出理解来重塑数据.下面介绍了一种替代方法:

If you are looking for a more general way to unfold multiple hierarchies from a json you can use recursion and list comprehension to reshape your data. One alternative is presented below:

def flatten_json(nested_json, exclude=['']):
    """Flatten json object with nested keys into a single level.
        Args:
            nested_json: A nested json object.
            exclude: Keys to exclude from output.
        Returns:
            The flattened json object if successful, None otherwise.
    """
    out = {}

    def flatten(x, name='', exclude=exclude):
        if type(x) is dict:
            for a in x:
                if a not in exclude: flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(nested_json)
    return out

然后,您可以应用于数据,而与嵌套级别无关:

Then you can apply to your data, independent of nested levels:

新示例数据

this_dict = {'events': [
  {'id': 142896214,
   'playerId': 37831,
   'teamId': 3157,
   'matchId': 2214569,
   'matchPeriod': '1H',
   'eventSec': 0.8935539999999946,
   'eventId': 8,
   'eventName': 'Pass',
   'subEventId': 85,
   'subEventName': 'Simple pass',
   'positions': [{'x': 51, 'y': 49}, {'x': 40, 'y': 53}],
   'tags': [{'id': 1801, 'tag': {'label': 'accurate'}}]},
 {'id': 142896214,
   'playerId': 37831,
   'teamId': 3157,
   'matchId': 2214569,
   'matchPeriod': '1H',
   'eventSec': 0.8935539999999946,
   'eventId': 8,
   'eventName': 'Pass',
   'subEventId': 85,
   'subEventName': 'Simple pass',
   'positions': [{'x': 51, 'y': 49}, {'x': 40, 'y': 53},{'x': 51, 'y': 49}],
   'tags': [{'id': 1801, 'tag': {'label': 'accurate'}}]}
]}

用法

pd.DataFrame([flatten_json(x) for x in this_dict['events']])

Out[1]:
          id  playerId  teamId  matchId matchPeriod  eventSec  eventId  \
0  142896214     37831    3157  2214569          1H  0.893554        8   
1  142896214     37831    3157  2214569          1H  0.893554        8   

  eventName  subEventId subEventName  positions_0_x  positions_0_y  \
0      Pass          85  Simple pass             51             49   
1      Pass          85  Simple pass             51             49   

   positions_1_x  positions_1_y  tags_0_id tags_0_tag_label  positions_2_x  \
0             40             53       1801         accurate            NaN   
1             40             53       1801         accurate           51.0   

   positions_2_y  
0            NaN  
1           49.0  

请注意,此flatten_json代码不是我的,我已经看到了它

Note that this flatten_json code is not mine, I have seen it here and here without much certainty of the original source.

这篇关于在 pandas 数据框中展平嵌套的Json的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆