如何从嵌套的json中提取字段并保存在数据结构中 [英] How to extract fields from nested json and save in a data structure

查看:52
本文介绍了如何从嵌套的json中提取字段并保存在数据结构中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用python提取名称为"first","last"和"zipcode"的字段.以及它们各自来自JSON的值(结构并不总是已知的).JSON的示例可能如下所示:

Using python, I am trying to extract fields of the name "first", "last", and "zipcode" and their respective values from JSON where the structure is not always known. An example of the JSON could look something like this:

{
"employees": [
    {
        "first": "Alice",
        "last_name": "Alast",
        "zipcode": "12345",
        "role": "dev",
        "nbr": 1,
        "team": [
            {
                "first_name": "fn",
                "last_name": "ln"
            },
            {
                "first_name": "fn2",
                "last_name": "ln2"
            }
        ]
    },
    {
        "name": "Bob",
        "role": "dev",
        "nbr": 2
    }
],
"firm": {
    "last_name": "Lhans",
    "zipcode": "67890",
    "location": "CA"
}}

除此以外,我想将其保存在数据结构中,例如:

In addition to this, I want to save this in a data structure, such as:

{ 
  {
    first: "firstname",
    last: "lastname",
    zipcode: "zipcode"
  }
}

我已尝试将嵌套的JSON展平,使函数基于.我可以通过这种方式获取字段,但是很难找到一种以上述模型的格式保存此数据的最佳方法.如果其中一个字段为空,我想将该字段填写为NaN或一个空字符串,而不是完全忽略它.到目前为止,这就是我创建的内容,它创建了一个列表字段和值,但是如果该字段不存在,它将跳过它,而不是用none值填充它.

I have tried flattening the nested JSON, basing my function off this. I can get the fields this way, but am having difficulty finding an optimal way to save this data in the format the model mentioned above. If one of the fields are empty, I want to fill that field in as NaN or an empty string, rather than ignoring it completely. Here's what I have so far, which creates a list fields and values, but if the field does not exist, it skips it instead of filling it with a none value.

def flatten_json(nested_json, fields: list):
    out = []
    
    def flatten(x, name=''):
            if type(x) is dict:
                for a in x:
                    flatten(x[a], a)
            elif type(x) is list:
                i = 0
                for a in x:
                    flatten(a)
                    i += 1
            elif name in fields:
                out.append(name+": "+x)
    flatten(nested_json)
    return out

这给了我类似的东西

['first: Alice', 'last: Jones', 'zipcode: 12345', 'first: fn1', 'last: ln1', 'first: fn2', 'last: ln2', 'last: ln3', 'zipcode: 67890']

哪个不理想.我宁愿所有缺失的字段都用NaN或空字符串填充,而不是列表中不存在.

Which isn't ideal. I'd rather have any missing fields filled with NaN or an empty string rather than not exist in the list.

推荐答案

我修改了您的函数以捕获字典列表.该词典将只包含在字段列表中指定的字段作为键.

I've modified your function to capture the list of dictionaries. The dictionary will only contain the fields specified in the fields list as keys.


import pandas as pd


def flatten_json(nested_json, fields):
    out = []
    temp = {}

    def flatten(x, name=''):
        nonlocal temp
        if type(x) is dict:
            temp = {}
            for a in x:
                flatten(x[a], a)
        elif type(x) is list:
            for i, a in enumerate(x):
                flatten(a)
                i += 1
        elif name in fields:
            temp[name] = x
            out.append(temp)
    flatten(nested_json)
    return out


json1 = {"employees": [{"first": "Alice", "last_name": "Alast", "zipcode": "12345", "role": "dev", "nbr": 1, "team": [{"first_name": "fn", "last_name": "ln"}, {
    "first_name": "fn2", "last_name": "ln2"}]}, {"name": "Bob", "role": "dev", "nbr": 2}], "firm": {"last_name": "Lhans", "zipcode": "67890", "location": "CA"}}

fields = ['first_name', 'last_name', 'zipcode']
result = (flatten_json(json1, fields))

然后可以将上述函数的输出加载到pandas数据框中-

The output of the above function can then be loaded into pandas dataframe -

df = pd.DataFrame(result)
df.drop_duplicates(inplace=True)
print(df)

这将给出这样的输出-

  last_name zipcode first_name
0     Alast   12345        NaN
2        ln     NaN         fn
4       ln2     NaN        fn2
6     Lhans   67890        NaN

现在,要以JSON格式获取数据,您可以使用to_dict()函数将数据框转换回dict-

Now, to get the data back in JSON format you can convert the dataframe back to dict using to_dict() function -

print(df.to_dict(orient='records'))

输出-

[{'first_name': nan, 'last_name': 'Alast', 'zipcode': '12345'},
 {'first_name': 'fn', 'last_name': 'ln', 'zipcode': nan},
 {'first_name': 'fn2', 'last_name': 'ln2', 'zipcode': nan},
 {'first_name': nan, 'last_name': 'Lhans', 'zipcode': '67890'}]

这篇关于如何从嵌套的json中提取字段并保存在数据结构中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆