如何将多个深度嵌套的 JSON 文件展平到 Pandas 数据帧中? [英] How to flatten multiple, deeply nested, JSON files, into a pandas dataframe?

查看:56
本文介绍了如何将多个深度嵌套的 JSON 文件展平到 Pandas 数据帧中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试扁平化深层嵌套的 json 文件.

I'm trying to flatten deeply nested json files.

我有 22 个 json 文件,我想将它们收集在一个 Pandas 数据框中.我设法用 json_normalize 将它们展平到第二级,但我无法进一步解析它.有时 jsons 有超过 5 个级别.

I have 22 json files which i want to gather in one pandas dataframe. I managed to flatten them with json_normalize to the second level, but I am not able to parse it further. Sometimes the jsons have more than 5 levels.

我想提取 _idactType 和所有位于子级"不同级别的文本数据.Json 文件示例如下.非常感谢您的帮助!

I want to extract the _id, the actType and all the text data which is located in the different levels of "children". Example of the Json file follows. Really appreciate your help!

{
    "_id": "test1",
    "actType": "FINDING",
    "entries": [{
            "text": "U Ergebnis:",
            "isDocumentationNode": false,
            "children": [{
                    "text": "U3: Standartext",
                    "isDocumentationNode": true,
                    "children": []
                }, {
                    "text": "Brückner durchgeführt o.p.B.",
                    "isDocumentationNode": true,
                    "children": []
                }, {
                    "text": "Normale körperliche und altersgerecht Entwicklung",
                    "isDocumentationNode": true,
                    "children": [{
                            "text": "J1/2",
                            "isDocumentationNode": false,
                            "children": [{
                                    "text": "Schule:",
                                    "isDocumentationNode": true,
                                    "children": [{
                                            "text": "Ziel Abitur",
                                            "isDocumentationNode": true,
                                            "children": [{
                                                    "text": "läuft",
                                                    "isDocumentationNode": true,
                                                    "children": []
                                                }, {
                                                    "text": "gefährdet",
                                                    "isDocumentationNode": true,
                                                    "children": []
                                                }, {
                                                    "text": "läuft",
                                                    "isDocumentationNode": true,
                                                    "children": []
                                                }, {
                                                    "text": "gefährdet",
                                                    "isDocumentationNode": true,
                                                    "children": []
                                                }
                                            ]
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]

}

import pandas as pd

# load file
df = pd.read_json('test.json')

# display(df)
     _id  actType                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   entries
0  test1  FINDING  {'text': 'U Ergebnis:', 'isDocumentationNode': False, 'children': [{'text': 'U3: Standartext', 'isDocumentationNode': True, 'children': []}, {'text': 'Brückner durchgeführt o.p.B.', 'isDocumentationNode': True, 'children': []}, {'text': 'Normale körperliche und altersgerecht Entwicklung', 'isDocumentationNode': True, 'children': [{'text': 'J1/2', 'isDocumentationNode': False, 'children': [{'text': 'Schule:', 'isDocumentationNode': True, 'children': [{'text': 'Ziel Abitur', 'isDocumentationNode': True, 'children': [{'text': 'läuft', 'isDocumentationNode': True, 'children': []}, {'text': 'gefährdet', 'isDocumentationNode': True, 'children': []}, {'text': 'läuft', 'isDocumentationNode': True, 'children': []}, {'text': 'gefährdet', 'isDocumentationNode': True, 'children': []}]}]}]}]}]}

  • 这会在 'entries' 列中产生一个嵌套的 dict,但我需要一个扁平的、宽的数据框,所有的键都作为列.
    • This results in a nested dict in the 'entries' column, but I need a flat, wide dataframe, with all keys as columns.
    • 推荐答案

      • 使用 flatten_json 函数,如SO:如何使用 flatten_json 递归地展平嵌套的 JSON?
        • 这会将每个 JSON 文件展平.
        • 此函数递归地展平嵌套的 JSON 文件.
        • 从链接的 SO 问题中复制 flatten_json 函数.
          • Use the flatten_json function, as described in SO: How to flatten a nested JSON recursively, with flatten_json?
            • This will flatten each JSON file wide.
            • This function recursively flattens nested JSON files.
            • Copy the flatten_json function from the linked SO question.
            • import json
              import pandas as pd
              
              # list of files
              files = ['test1.json', 'test2.json']
              
              # list to add dataframe from each file
              df_list = list()
              
              # iterate through files
              for file in files:
                  with open(file, 'r', encoding='utf-8') as f:
              
                      # read with json
                      data = json.loads(f.read())
              
                      # flatten_json into a dataframe and add to the dataframe list
                      df_list.append(pd.DataFrame.from_dict(flatten_json(data), orient='index').T)
                      
              # concat all dataframes together
              df = pd.concat(df_list).reset_index(drop=True)
              
              # display(df)
                   _id  actType entries_0_text entries_0_isDocumentationNode entries_0_children_0_text entries_0_children_0_isDocumentationNode     entries_0_children_1_text entries_0_children_1_isDocumentationNode                          entries_0_children_2_text entries_0_children_2_isDocumentationNode entries_0_children_2_children_0_text entries_0_children_2_children_0_isDocumentationNode entries_0_children_2_children_0_children_0_text entries_0_children_2_children_0_children_0_isDocumentationNode entries_0_children_2_children_0_children_0_children_0_text entries_0_children_2_children_0_children_0_children_0_isDocumentationNode entries_0_children_2_children_0_children_0_children_0_children_0_text entries_0_children_2_children_0_children_0_children_0_children_0_isDocumentationNode entries_0_children_2_children_0_children_0_children_0_children_1_text entries_0_children_2_children_0_children_0_children_0_children_1_isDocumentationNode entries_0_children_2_children_0_children_0_children_0_children_2_text entries_0_children_2_children_0_children_0_children_0_children_2_isDocumentationNode entries_0_children_2_children_0_children_0_children_0_children_3_text entries_0_children_2_children_0_children_0_children_0_children_3_isDocumentationNode
              0  test1  FINDING    U Ergebnis:                         False           U3: Standartext                                     True  Brückner durchgeführt o.p.B.                                     True  Normale körperliche und altersgerecht Entwicklung                                     True                                 J1/2                                               False                                         Schule:                                                           True                                                Ziel Abitur                                                                      True                                                                 läuft                                                                                 True                                                             gefährdet                                                                                 True                                                                 läuft                                                                                 True                                                             gefährdet                                                                                 True
              1  test2  FINDING    U Ergebnis:                         False           U3: Standartext                                     True  Brückner durchgeführt o.p.B.                                     True  Normale körperliche und altersgerecht Entwicklung                                     True                                 J1/2                                               False                                         Schule:                                                           True                                                Ziel Abitur                                                                      True                                                                 läuft                                                                                 True                                                             gefährdet                                                                                 True                                                                   NaN                                                                                  NaN                                                                   NaN                                                                                  NaN
              

              数据

              • test1.json
              • {
                    "_id": "test1",
                    "actType": "FINDING",
                    "entries": [{
                            "text": "U Ergebnis:",
                            "isDocumentationNode": false,
                            "children": [{
                                    "text": "U3: Standartext",
                                    "isDocumentationNode": true,
                                    "children": []
                                }, {
                                    "text": "Brückner durchgeführt o.p.B.",
                                    "isDocumentationNode": true,
                                    "children": []
                                }, {
                                    "text": "Normale körperliche und altersgerecht Entwicklung",
                                    "isDocumentationNode": true,
                                    "children": [{
                                            "text": "J1/2",
                                            "isDocumentationNode": false,
                                            "children": [{
                                                    "text": "Schule:",
                                                    "isDocumentationNode": true,
                                                    "children": [{
                                                            "text": "Ziel Abitur",
                                                            "isDocumentationNode": true,
                                                            "children": [{
                                                                    "text": "läuft",
                                                                    "isDocumentationNode": true,
                                                                    "children": []
                                                                }, {
                                                                    "text": "gefährdet",
                                                                    "isDocumentationNode": true,
                                                                    "children": []
                                                                }, {
                                                                    "text": "läuft",
                                                                    "isDocumentationNode": true,
                                                                    "children": []
                                                                }, {
                                                                    "text": "gefährdet",
                                                                    "isDocumentationNode": true,
                                                                    "children": []
                                                                }
                                                            ]
                                                        }
                                                    ]
                                                }
                                            ]
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                
                }
                
                

                • test2.json
                • {
                      "_id": "test2",
                      "actType": "FINDING",
                      "entries": [{
                              "text": "U Ergebnis:",
                              "isDocumentationNode": false,
                              "children": [{
                                      "text": "U3: Standartext",
                                      "isDocumentationNode": true,
                                      "children": []
                                  }, {
                                      "text": "Brückner durchgeführt o.p.B.",
                                      "isDocumentationNode": true,
                                      "children": []
                                  }, {
                                      "text": "Normale körperliche und altersgerecht Entwicklung",
                                      "isDocumentationNode": true,
                                      "children": [{
                                              "text": "J1/2",
                                              "isDocumentationNode": false,
                                              "children": [{
                                                      "text": "Schule:",
                                                      "isDocumentationNode": true,
                                                      "children": [{
                                                              "text": "Ziel Abitur",
                                                              "isDocumentationNode": true,
                                                              "children": [{
                                                                      "text": "läuft",
                                                                      "isDocumentationNode": true,
                                                                      "children": []
                                                                  }, {
                                                                      "text": "gefährdet",
                                                                      "isDocumentationNode": true,
                                                                      "children": []
                                                                  }
                                                              ]
                                                          }
                                                      ]
                                                  }
                                              ]
                                          }
                                      ]
                                  }
                              ]
                          }
                      ]
                  
                  }
                  
                  

                  这篇关于如何将多个深度嵌套的 JSON 文件展平到 Pandas 数据帧中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆