如何将多个深度嵌套的 JSON 文件展平到 Pandas 数据帧中? [英] How to flatten multiple, deeply nested, JSON files, into a pandas dataframe?
本文介绍了如何将多个深度嵌套的 JSON 文件展平到 Pandas 数据帧中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试扁平化深层嵌套的 json 文件.
I'm trying to flatten deeply nested json files.
我有 22 个 json 文件,我想将它们收集在一个 Pandas 数据框中.我设法用 json_normalize 将它们展平到第二级,但我无法进一步解析它.有时 jsons 有超过 5 个级别.
I have 22 json files which i want to gather in one pandas dataframe. I managed to flatten them with json_normalize to the second level, but I am not able to parse it further. Sometimes the jsons have more than 5 levels.
我想提取 _id
、actType
和所有位于子级"不同级别的文本数据.Json 文件示例如下.非常感谢您的帮助!
I want to extract the _id
, the actType
and all the text data which is located in the different levels of "children". Example of the Json file follows. Really appreciate your help!
{
"_id": "test1",
"actType": "FINDING",
"entries": [{
"text": "U Ergebnis:",
"isDocumentationNode": false,
"children": [{
"text": "U3: Standartext",
"isDocumentationNode": true,
"children": []
}, {
"text": "Brückner durchgeführt o.p.B.",
"isDocumentationNode": true,
"children": []
}, {
"text": "Normale körperliche und altersgerecht Entwicklung",
"isDocumentationNode": true,
"children": [{
"text": "J1/2",
"isDocumentationNode": false,
"children": [{
"text": "Schule:",
"isDocumentationNode": true,
"children": [{
"text": "Ziel Abitur",
"isDocumentationNode": true,
"children": [{
"text": "läuft",
"isDocumentationNode": true,
"children": []
}, {
"text": "gefährdet",
"isDocumentationNode": true,
"children": []
}, {
"text": "läuft",
"isDocumentationNode": true,
"children": []
}, {
"text": "gefährdet",
"isDocumentationNode": true,
"children": []
}
]
}
]
}
]
}
]
}
]
}
]
}
import pandas as pd
# load file
df = pd.read_json('test.json')
# display(df)
_id actType entries
0 test1 FINDING {'text': 'U Ergebnis:', 'isDocumentationNode': False, 'children': [{'text': 'U3: Standartext', 'isDocumentationNode': True, 'children': []}, {'text': 'Brückner durchgeführt o.p.B.', 'isDocumentationNode': True, 'children': []}, {'text': 'Normale körperliche und altersgerecht Entwicklung', 'isDocumentationNode': True, 'children': [{'text': 'J1/2', 'isDocumentationNode': False, 'children': [{'text': 'Schule:', 'isDocumentationNode': True, 'children': [{'text': 'Ziel Abitur', 'isDocumentationNode': True, 'children': [{'text': 'läuft', 'isDocumentationNode': True, 'children': []}, {'text': 'gefährdet', 'isDocumentationNode': True, 'children': []}, {'text': 'läuft', 'isDocumentationNode': True, 'children': []}, {'text': 'gefährdet', 'isDocumentationNode': True, 'children': []}]}]}]}]}]}
- 这会在
'entries'
列中产生一个嵌套的dict
,但我需要一个扁平的、宽的数据框,所有的键都作为列. - This results in a nested
dict
in the'entries'
column, but I need a flat, wide dataframe, with all keys as columns. - 使用
flatten_json
函数,如SO:如何使用 flatten_json 递归地展平嵌套的 JSON?- 这会将每个 JSON 文件展平.
- 此函数递归地展平嵌套的 JSON 文件.
- 从链接的 SO 问题中复制
flatten_json
函数. - Use the
flatten_json
function, as described in SO: How to flatten a nested JSON recursively, with flatten_json?- This will flatten each JSON file wide.
- This function recursively flattens nested JSON files.
- Copy the
flatten_json
function from the linked SO question.
import json import pandas as pd # list of files files = ['test1.json', 'test2.json'] # list to add dataframe from each file df_list = list() # iterate through files for file in files: with open(file, 'r', encoding='utf-8') as f: # read with json data = json.loads(f.read()) # flatten_json into a dataframe and add to the dataframe list df_list.append(pd.DataFrame.from_dict(flatten_json(data), orient='index').T) # concat all dataframes together df = pd.concat(df_list).reset_index(drop=True) # display(df) _id actType entries_0_text entries_0_isDocumentationNode entries_0_children_0_text entries_0_children_0_isDocumentationNode entries_0_children_1_text entries_0_children_1_isDocumentationNode entries_0_children_2_text entries_0_children_2_isDocumentationNode entries_0_children_2_children_0_text entries_0_children_2_children_0_isDocumentationNode entries_0_children_2_children_0_children_0_text entries_0_children_2_children_0_children_0_isDocumentationNode entries_0_children_2_children_0_children_0_children_0_text entries_0_children_2_children_0_children_0_children_0_isDocumentationNode entries_0_children_2_children_0_children_0_children_0_children_0_text entries_0_children_2_children_0_children_0_children_0_children_0_isDocumentationNode entries_0_children_2_children_0_children_0_children_0_children_1_text entries_0_children_2_children_0_children_0_children_0_children_1_isDocumentationNode entries_0_children_2_children_0_children_0_children_0_children_2_text entries_0_children_2_children_0_children_0_children_0_children_2_isDocumentationNode entries_0_children_2_children_0_children_0_children_0_children_3_text entries_0_children_2_children_0_children_0_children_0_children_3_isDocumentationNode 0 test1 FINDING U Ergebnis: False U3: Standartext True Brückner durchgeführt o.p.B. True Normale körperliche und altersgerecht Entwicklung True J1/2 False Schule: True Ziel Abitur True läuft True gefährdet True läuft True gefährdet True 1 test2 FINDING U Ergebnis: False U3: Standartext True Brückner durchgeführt o.p.B. True Normale körperliche und altersgerecht Entwicklung True J1/2 False Schule: True Ziel Abitur True läuft True gefährdet True NaN NaN NaN NaN
数据
test1.json
{ "_id": "test1", "actType": "FINDING", "entries": [{ "text": "U Ergebnis:", "isDocumentationNode": false, "children": [{ "text": "U3: Standartext", "isDocumentationNode": true, "children": [] }, { "text": "Brückner durchgeführt o.p.B.", "isDocumentationNode": true, "children": [] }, { "text": "Normale körperliche und altersgerecht Entwicklung", "isDocumentationNode": true, "children": [{ "text": "J1/2", "isDocumentationNode": false, "children": [{ "text": "Schule:", "isDocumentationNode": true, "children": [{ "text": "Ziel Abitur", "isDocumentationNode": true, "children": [{ "text": "läuft", "isDocumentationNode": true, "children": [] }, { "text": "gefährdet", "isDocumentationNode": true, "children": [] }, { "text": "läuft", "isDocumentationNode": true, "children": [] }, { "text": "gefährdet", "isDocumentationNode": true, "children": [] } ] } ] } ] } ] } ] } ] }
test2.json
{ "_id": "test2", "actType": "FINDING", "entries": [{ "text": "U Ergebnis:", "isDocumentationNode": false, "children": [{ "text": "U3: Standartext", "isDocumentationNode": true, "children": [] }, { "text": "Brückner durchgeführt o.p.B.", "isDocumentationNode": true, "children": [] }, { "text": "Normale körperliche und altersgerecht Entwicklung", "isDocumentationNode": true, "children": [{ "text": "J1/2", "isDocumentationNode": false, "children": [{ "text": "Schule:", "isDocumentationNode": true, "children": [{ "text": "Ziel Abitur", "isDocumentationNode": true, "children": [{ "text": "läuft", "isDocumentationNode": true, "children": [] }, { "text": "gefährdet", "isDocumentationNode": true, "children": [] } ] } ] } ] } ] } ] } ] }
这篇关于如何将多个深度嵌套的 JSON 文件展平到 Pandas 数据帧中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
推荐答案
查看全文