如何在python中标准化复杂的嵌套json? [英] How to normalize complex nested json in python?

查看:639
本文介绍了如何在python中标准化复杂的嵌套json?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在python中标准化复杂的嵌套json,但无法解析所有对象.

I am trying to normalize complex nested json in python but I am unable to parse all the objects out.

我正在引用此页面中的代码. https://medium.com/@amirziai/flattening-json- python-f5343c794b10中的对象

I am referencing the code from this page. https://medium.com/@amirziai/flattening-json-objects-in-python-f5343c794b10

sample_object = {'Name':'John', 'Location':{'City':'Los Angeles','State':'CA'}, 'hobbies':['Music', 'Running']}

def flatten_json(y):
    out = {}

    def flatten(x, name=''):  

        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            for a in x:
                flatten(a, name)
        else:
            out[name[:-1]] = x

    flatten(y)

    return out
flat = flatten_json(sample_object)
print json_normalize(flat)

返回结果:

Name | Location_City | Location_State | Hobbies
-----+---------------+----------------+--------
John | Los Angeles   | CA             | Running

预期结果:

Name | Location_City | Location_State | Hobbies
-----+---------------+----------------+--------
John | Los Angeles   | CA             | Running
John | Los Angeles   | CA             | Music

推荐答案

您遇到的问题源自以下部分

The problem you are having originates in the following section

elif type(x) is list:
    for a in x:
        flatten(a, name)

因为您没有更改列表中每个元素的名称,所以每个下一个元素都会覆盖前一个元素的分配,因此输出中只会显示最后一个元素.

Because you do not change the name for every element of the list, every next element will override the assignment of the previous element and thus only the last element will show in the output.

在此示例中,当展平功能到达列表爱好"时,它将首先为元素音乐"分配名称爱好",并将其发送到输出.在音乐"元素之后,列表中的下一个元素是跑步",这也将分配为爱好"名称.当将此元素发送到输出时,它将注意到名称"hobbies"已经存在,并且它将用值"Running"覆盖值"Music".

Applied to this example, when the flattening function reaches the list 'hobbies' it will first assign the name 'hobbies' to the element 'Music' and send it to the output. After the element 'Music', the next element in the list is 'Running', which will also be asigned the name 'hobbies'. When this element is send to the output it will notice that the name 'hobbies' already exists and it will override the value 'Music' with the value 'Running'.

为防止这种情况,您引用的链接中的脚本使用以下代码将de array的索引附加到名称后,从而为数组的每个元素创建一个唯一的名称.

To prevent this the script from the link you referenced uses the following piece of code to append de array's index to the name, thus creating a unique name for every element of the array.

elif type(x) is list:
    i = 0
    for a in x:
        flatten(a, name + str(i) + ' ')
        i += 1

这将为数据创建一个额外的列",而不是一个新行.如果您要使用后者,则必须更改功能的设置方式.一种方法可能是使函数适应返回json的列表(原始json中的每个列表元素一个).

This would create an extra 'columns' to the data however rather then a new row. If the latter is what you want you would have to change the way the functions is set up. One way could be to adapt the function to return an list of json's (one for each list element in the original json).

一个额外的注意事项:我建议在提交问题时对代码稍加谨慎一点.在这种情况下,缩进有点,因为您省略了导入json_normalize的部分,所以对于每个人来说,从熊猫导入它可能都不是很清楚

An extra note: I would recommend beeing a bit more carefull with coppying code when submitting a question. The indenting is a bit of in this case and since you left out the part where you import json_normalize it might not be completely clear for everyone that you are importing it from pandas

from pandas.io.json import json_normalize

这篇关于如何在python中标准化复杂的嵌套json?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆