如何使用flatten_json递归地扁平化嵌套的JSON? [英] How to flatten nested JSON recursively, with flatten_json?

查看:569
本文介绍了如何使用flatten_json递归地扁平化嵌套的JSON?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  • 该软件包位于pypi flatten-json 0.1.7 上,可以与pip install flatten-json
  • 此问题特定于软件包的以下组件:
  • The package is on pypi flatten-json 0.1.7 and can be installed with pip install flatten-json
  • This question is specific to the following component of the package:
def flatten_json(nested_json: dict, exclude: list=[''], sep: str='_') -> dict:
    """
    Flatten a list of nested dicts.
    """
    out = dict()
    def flatten(x: (list, dict, str), name: str='', exclude=exclude):
        if type(x) is dict:
            for a in x:
                if a not in exclude:
                    flatten(x[a], f'{name}{a}{sep}')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, f'{name}{i}{sep}')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(nested_json)
    return out

使用递归展平嵌套的dicts

  • 在Python中进行递归思考
  • 在Python中平整JSON对象
  • Use recursion to flatten nested dicts

    • Thinking Recursively in Python
    • Flattening JSON objects in Python
      • flatten_json已用于解压缩最终超过100000列的文件
      • flatten_json has been used to unpack a file that ended up being over 100000 columns
      • 是的,这个问题不能解决这个问题.但是,如果安装flatten软件包,则有一个unflatten方法,但我尚未对其进行测试.
      • Yes, this question doesn't cover that. However, if you install the flatten package, there is an unflatten method, but I haven't tested it.

      推荐答案

      如何展平JSONdict是常见问题,对此有很多答案.

      • 此答案的重点是使用flatten_json递归展平嵌套的dictJSON
      • How to flatten a JSON or dict is a common question, to which, there are many answers.

        • This answer focuses on using flatten_json to recursively flatten a nested dict or JSON
          • 此答案假设您已经将JSONdict加载到某些变量(例如文件,api等)中
            • 在这种情况下,我们将使用data
            • This answer assumes you already have the JSON or dict loaded into some variable (e.g. file, api, etc.)
              • In this case we will use data
              • 它接受dict,如功能类型提示所示.
              • It accepts a dict, as shown by the function type hint.
              • 仅是字典:{}
                • flatten_json(data)
                • Just a dict: {}
                  • flatten_json(data)
                  • [flatten_json(x) for x in data]
                  • [flatten_json(data[key]) for key in data.keys()]
                  • {'key': [{}, {}, {}]}:[flatten_json(x) for x in data['key']]
                  • 我通常将data展平为pandas.DataFrame
                    • import pandas as pd
                    • 加载pandas
                    • I typically flatten data into a pandas.DataFrame
                      • Load pandas with import pandas as pd
                      {
                          "id": 1,
                          "class": "c1",
                          "owner": "myself",
                          "metadata": {
                              "m1": {
                                  "value": "m1_1",
                                  "timestamp": "d1"
                              },
                              "m2": {
                                  "value": "m1_2",
                                  "timestamp": "d2"
                              },
                              "m3": {
                                  "value": "m1_3",
                                  "timestamp": "d3"
                              },
                              "m4": {
                                  "value": "m1_4",
                                  "timestamp": "d4"
                              }
                          },
                          "a1": {
                              "a11": [
                      
                              ]
                          },
                          "m1": {},
                          "comm1": "COMM1",
                          "comm2": "COMM21529089656387",
                          "share": "xxx",
                          "share1": "yyy",
                          "hub1": "h1",
                          "hub2": "h2",
                          "context": [
                      
                          ]
                      }
                      

                      Flatten 1:

                      df = pd.DataFrame([flatten_json(data)])
                      
                       id class   owner metadata_m1_value metadata_m1_timestamp metadata_m2_value metadata_m2_timestamp metadata_m3_value metadata_m3_timestamp metadata_m4_value metadata_m4_timestamp  comm1               comm2 share share1 hub1 hub2
                        1    c1  myself              m1_1                    d1              m1_2                    d2              m1_3                    d3              m1_4                    d4  COMM1  COMM21529089656387   xxx    yyy   h1   h2
                      

                      数据2:

                      [{
                              'accuracy': 17,
                              'activity': [{
                                      'activity': [{
                                              'confidence': 100,
                                              'type': 'STILL'
                                          }
                                      ],
                                      'timestampMs': '1542652'
                                  }
                              ],
                              'altitude': -10,
                              'latitudeE7': 3777321,
                              'longitudeE7': -122423125,
                              'timestampMs': '1542654',
                              'verticalAccuracy': 2
                          }, {
                              'accuracy': 17,
                              'activity': [{
                                      'activity': [{
                                              'confidence': 100,
                                              'type': 'STILL'
                                          }
                                      ],
                                      'timestampMs': '1542652'
                                  }
                              ],
                              'altitude': -10,
                              'latitudeE7': 3777321,
                              'longitudeE7': -122423125,
                              'timestampMs': '1542654',
                              'verticalAccuracy': 2
                          }, {
                              'accuracy': 17,
                              'activity': [{
                                      'activity': [{
                                              'confidence': 100,
                                              'type': 'STILL'
                                          }
                                      ],
                                      'timestampMs': '1542652'
                                  }
                              ],
                              'altitude': -10,
                              'latitudeE7': 3777321,
                              'longitudeE7': -122423125,
                              'timestampMs': '1542654',
                              'verticalAccuracy': 2
                          }
                      ]
                      

                      Flatten 2:

                      df = pd.DataFrame([flatten_json(x) for x in data])
                      
                       accuracy  activity_0_activity_0_confidence activity_0_activity_0_type activity_0_timestampMs  altitude  latitudeE7  longitudeE7 timestampMs  verticalAccuracy
                             17                               100                      STILL                1542652       -10     3777321   -122423125     1542654                 2
                             17                               100                      STILL                1542652       -10     3777321   -122423125     1542654                 2
                             17                               100                      STILL                1542652       -10     3777321   -122423125     1542654                 2
                      

                      数据3:

                      {
                          "1": {
                              "VENUE": "JOEBURG",
                              "COUNTRY": "HAE",
                              "ITW": "XAD",
                              "RACES": {
                                  "1": {
                                      "NO": 1,
                                      "TIME": "12:35"
                                  },
                                  "2": {
                                      "NO": 2,
                                      "TIME": "13:10"
                                  },
                                  "3": {
                                      "NO": 3,
                                      "TIME": "13:40"
                                  },
                                  "4": {
                                      "NO": 4,
                                      "TIME": "14:10"
                                  },
                                  "5": {
                                      "NO": 5,
                                      "TIME": "14:55"
                                  },
                                  "6": {
                                      "NO": 6,
                                      "TIME": "15:30"
                                  },
                                  "7": {
                                      "NO": 7,
                                      "TIME": "16:05"
                                  },
                                  "8": {
                                      "NO": 8,
                                      "TIME": "16:40"
                                  }
                              }
                          },
                          "2": {
                              "VENUE": "FOOBURG",
                              "COUNTRY": "ABA",
                              "ITW": "XAD",
                              "RACES": {
                                  "1": {
                                      "NO": 1,
                                      "TIME": "12:35"
                                  },
                                  "2": {
                                      "NO": 2,
                                      "TIME": "13:10"
                                  },
                                  "3": {
                                      "NO": 3,
                                      "TIME": "13:40"
                                  },
                                  "4": {
                                      "NO": 4,
                                      "TIME": "14:10"
                                  },
                                  "5": {
                                      "NO": 5,
                                      "TIME": "14:55"
                                  },
                                  "6": {
                                      "NO": 6,
                                      "TIME": "15:30"
                                  },
                                  "7": {
                                      "NO": 7,
                                      "TIME": "16:05"
                                  },
                                  "8": {
                                      "NO": 8,
                                      "TIME": "16:40"
                                  }
                              }
                          }
                      }
                      

                      Flatten 3:

                      df = pd.DataFrame([flatten_json(data[key]) for key in data.keys()])
                      
                         VENUE COUNTRY  ITW  RACES_1_NO RACES_1_TIME  RACES_2_NO RACES_2_TIME  RACES_3_NO RACES_3_TIME  RACES_4_NO RACES_4_TIME  RACES_5_NO RACES_5_TIME  RACES_6_NO RACES_6_TIME  RACES_7_NO RACES_7_TIME  RACES_8_NO RACES_8_TIME
                       JOEBURG     HAE  XAD           1        12:35           2        13:10           3        13:40           4        14:10           5        14:55           6        15:30           7        16:05           8        16:40
                       FOOBURG     ABA  XAD           1        12:35           2        13:10           3        13:40           4        14:10           5        14:55           6        15:30           7        16:05           8        16:40
                      

                      其他示例:

                      1. Python Pandas-Flattened nested JSON
                      2. 在熊猫中处理嵌套的json
                      1. Python Pandas - Flatten Nested JSON
                      2. handling nested json in pandas
                      3. How to flatten a nested JSON from the NASA Weather Insight API in Python

                      这篇关于如何使用flatten_json递归地扁平化嵌套的JSON?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆