如何使用 flatten_json 递归地展平嵌套的 JSON [英] How to flatten a nested JSON recursively, with flatten_json

查看:37
本文介绍了如何使用 flatten_json 递归地展平嵌套的 JSON的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  • The package is on pypi flatten-json 0.1.7 and can be installed with pip install flatten-json
  • This question is specific to the following component of the package:
def flatten_json(nested_json: dict, exclude: list=[''], sep: str='_') -> dict:
    """
    Flatten a list of nested dicts.
    """
    out = dict()
    def flatten(x: (list, dict, str), name: str='', exclude=exclude):
        if type(x) is dict:
            for a in x:
                if a not in exclude:
                    flatten(x[a], f'{name}{a}{sep}')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, f'{name}{i}{sep}')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(nested_json)
    return out

使用递归扁平化嵌套的dicts

  • 在 Python 中递归思考
  • 在 Python 中展平 JSON 对象
    • flatten_json 被用来解压一个超过 100000 列的文件
    • flatten_json has been used to unpack a file that ended up being over 100000 columns
    • 是的,这个问题不包括那个.但是,如果你安装了flatten 包,有一个unflatten 方法,但我没有测试过.
    • Yes, this question doesn't cover that. However, if you install the flatten package, there is an unflatten method, but I haven't tested it.

    推荐答案

    How to flatten a JSON or dict 是一个常见的问题,有很多答案.

    • 此答案侧重于使用 flatten_json 递归地展平嵌套的 dictJSON.
    • How to flatten a JSON or dict is a common question, to which there are many answers.

      • This answer focuses on using flatten_json to recursively flatten a nested dict or JSON.
        • 这个答案假设您已经将 JSONdict 加载到某个变量(例如文件、api 等)中
          • 在这种情况下,我们将使用 data
          • This answer assumes you already have the JSON or dict loaded into some variable (e.g. file, api, etc.)
            • In this case we will use data
            • 它接受一个 dict,如函数类型提示所示.
            • It accepts a dict, as shown by the function type hint.
            • 只是一个字典:{}
              • flatten_json(data)
              • [flatten_json(x) for x in data]
              • [flatten_json(data[key]) for key in data.keys()]
              • {'key': [{}, {}, {}]}: [flatten_json(x) for x in data['key']]
              • 我通常将 data 扁平化为 pandas.DataFrame 以供进一步分析.
                • 加载 pandasimport pandas as pd
                • I typically flatten data into a pandas.DataFrame for further analysis.
                  • Load pandas with import pandas as pd
                  {
                      "id": 1,
                      "class": "c1",
                      "owner": "myself",
                      "metadata": {
                          "m1": {
                              "value": "m1_1",
                              "timestamp": "d1"
                          },
                          "m2": {
                              "value": "m1_2",
                              "timestamp": "d2"
                          },
                          "m3": {
                              "value": "m1_3",
                              "timestamp": "d3"
                          },
                          "m4": {
                              "value": "m1_4",
                              "timestamp": "d4"
                          }
                      },
                      "a1": {
                          "a11": [
                  
                          ]
                      },
                      "m1": {},
                      "comm1": "COMM1",
                      "comm2": "COMM21529089656387",
                      "share": "xxx",
                      "share1": "yyy",
                      "hub1": "h1",
                      "hub2": "h2",
                      "context": [
                  
                      ]
                  }
                  

                  展平 1:

                  df = pd.DataFrame([flatten_json(data)])
                  
                   id class   owner metadata_m1_value metadata_m1_timestamp metadata_m2_value metadata_m2_timestamp metadata_m3_value metadata_m3_timestamp metadata_m4_value metadata_m4_timestamp  comm1               comm2 share share1 hub1 hub2
                    1    c1  myself              m1_1                    d1              m1_2                    d2              m1_3                    d3              m1_4                    d4  COMM1  COMM21529089656387   xxx    yyy   h1   h2
                  


                  数据 2:

                  [{
                          'accuracy': 17,
                          'activity': [{
                                  'activity': [{
                                          'confidence': 100,
                                          'type': 'STILL'
                                      }
                                  ],
                                  'timestampMs': '1542652'
                              }
                          ],
                          'altitude': -10,
                          'latitudeE7': 3777321,
                          'longitudeE7': -122423125,
                          'timestampMs': '1542654',
                          'verticalAccuracy': 2
                      }, {
                          'accuracy': 17,
                          'activity': [{
                                  'activity': [{
                                          'confidence': 100,
                                          'type': 'STILL'
                                      }
                                  ],
                                  'timestampMs': '1542652'
                              }
                          ],
                          'altitude': -10,
                          'latitudeE7': 3777321,
                          'longitudeE7': -122423125,
                          'timestampMs': '1542654',
                          'verticalAccuracy': 2
                      }, {
                          'accuracy': 17,
                          'activity': [{
                                  'activity': [{
                                          'confidence': 100,
                                          'type': 'STILL'
                                      }
                                  ],
                                  'timestampMs': '1542652'
                              }
                          ],
                          'altitude': -10,
                          'latitudeE7': 3777321,
                          'longitudeE7': -122423125,
                          'timestampMs': '1542654',
                          'verticalAccuracy': 2
                      }
                  ]
                  

                  展平 2:

                  df = pd.DataFrame([flatten_json(x) for x in data])
                  
                   accuracy  activity_0_activity_0_confidence activity_0_activity_0_type activity_0_timestampMs  altitude  latitudeE7  longitudeE7 timestampMs  verticalAccuracy
                         17                               100                      STILL                1542652       -10     3777321   -122423125     1542654                 2
                         17                               100                      STILL                1542652       -10     3777321   -122423125     1542654                 2
                         17                               100                      STILL                1542652       -10     3777321   -122423125     1542654                 2
                  


                  数据 3:

                  {
                      "1": {
                          "VENUE": "JOEBURG",
                          "COUNTRY": "HAE",
                          "ITW": "XAD",
                          "RACES": {
                              "1": {
                                  "NO": 1,
                                  "TIME": "12:35"
                              },
                              "2": {
                                  "NO": 2,
                                  "TIME": "13:10"
                              },
                              "3": {
                                  "NO": 3,
                                  "TIME": "13:40"
                              },
                              "4": {
                                  "NO": 4,
                                  "TIME": "14:10"
                              },
                              "5": {
                                  "NO": 5,
                                  "TIME": "14:55"
                              },
                              "6": {
                                  "NO": 6,
                                  "TIME": "15:30"
                              },
                              "7": {
                                  "NO": 7,
                                  "TIME": "16:05"
                              },
                              "8": {
                                  "NO": 8,
                                  "TIME": "16:40"
                              }
                          }
                      },
                      "2": {
                          "VENUE": "FOOBURG",
                          "COUNTRY": "ABA",
                          "ITW": "XAD",
                          "RACES": {
                              "1": {
                                  "NO": 1,
                                  "TIME": "12:35"
                              },
                              "2": {
                                  "NO": 2,
                                  "TIME": "13:10"
                              },
                              "3": {
                                  "NO": 3,
                                  "TIME": "13:40"
                              },
                              "4": {
                                  "NO": 4,
                                  "TIME": "14:10"
                              },
                              "5": {
                                  "NO": 5,
                                  "TIME": "14:55"
                              },
                              "6": {
                                  "NO": 6,
                                  "TIME": "15:30"
                              },
                              "7": {
                                  "NO": 7,
                                  "TIME": "16:05"
                              },
                              "8": {
                                  "NO": 8,
                                  "TIME": "16:40"
                              }
                          }
                      }
                  }
                  

                  展平 3:

                  df = pd.DataFrame([flatten_json(data[key]) for key in data.keys()])
                  
                     VENUE COUNTRY  ITW  RACES_1_NO RACES_1_TIME  RACES_2_NO RACES_2_TIME  RACES_3_NO RACES_3_TIME  RACES_4_NO RACES_4_TIME  RACES_5_NO RACES_5_TIME  RACES_6_NO RACES_6_TIME  RACES_7_NO RACES_7_TIME  RACES_8_NO RACES_8_TIME
                   JOEBURG     HAE  XAD           1        12:35           2        13:10           3        13:40           4        14:10           5        14:55           6        15:30           7        16:05           8        16:40
                   FOOBURG     ABA  XAD           1        12:35           2        13:10           3        13:40           4        14:10           5        14:55           6        15:30           7        16:05           8        16:40
                  


                  其他示例:

                  1. Python Pandas - 压平嵌套的 JSON
                  2. 在熊猫中处理嵌套的json
                  3. 如何在 Python 中从 NASA Weather Insight API 展平嵌套的 JSON

                  这篇关于如何使用 flatten_json 递归地展平嵌套的 JSON的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆