标准化复杂的嵌套JSON文件 [英] Normalize a complex nested JSON file
问题描述
我正在尝试将下面的json文件标准化为4个表-"content","Modules","Images"和另一个表中的其他所有内容"
Im trying to normalize the below json file into 4 tables - "content", "Modules", "Images" and "Everything Else in another table"
{
"id": "0000050a",
"revision": 1580225050941,
"slot": "product-description",
"type": "E",
"create_date": 1580225050941,
"modified_date": 1580225050941,
"creator": "Auto",
"modifier": "Auto",
"audit_info": {
"date": 1580225050941,
"source": "AutoService",
"username": "Auto"
},
"total_ID": 1,
"name": "Auto_A1AM78C64UM0Y8_B07JCJR5HW",
"content": [{
"ID": ["B01"],
"content_revision": 1580225050941,
"template": {
"module": [{
"id": "module-11",
"text": null,
"header": [{
"id": "title",
"value": null,
"decorators": []
}],
"paragraph": [{
"id": "description",
"value": [],
"decorators": []
}],
"image": [{
"id": "image",
"assetId": "/images/2cdabb786d10.jpg",
"alt": " ",
"viewLarger": null,
"styleCodes": "__CR0,0,970,300_PT0_SX970_V1__",
"src": "image.jpg",
"originalSrc": null
}],
"integer": null,
"chart": null,
"list": null,
"video": null,
"gallery": null,
"composite": null,
"collection": null,
"product": null
}, {
"id": "module-6",
"text": null,
"header": [{
"id": "title1",
"value": "Dest ",
"decorators": []
}, {
"id": "title2",
"value": "cc",
"decorators": []
}, {
"id": "title3",
"value": "Col",
"decorators": []
}, {
"id": "title4",
"value": "C",
"decorators": []
}, {
"id": "caption1",
"value": null,
"decorators": []
}, {
"id": "caption2",
"value": null,
"decorators": []
}, {
"id": "caption3",
"value": null,
"decorators": []
}, {
"id": "caption4",
"value": null,
"decorators": []
}],
"paragraph": [{
"id": "description1",
"value": [" Sport"],
"decorators": [
[]
]
}, {
"id": "description2",
"value": ["elements "],
"decorators": [
[]
]
}, {
"id": "description3",
"value": ["Film "],
"decorators": [
[]
]
}, {
"id": "description4",
"value": ["Our signature "],
"decorators": [
[]
]
}],
"image": [{
"id": "image1",
"assetId": "/images/dbbfc9873e31.jpg",
"alt": " ",
"viewLarger": null,
"styleCodes": "__CR0,0,300,300_PT0_SX300_V1__",
"src": "image2_.jpg",
"originalSrc": null
}, {
"id": "image2",
"assetId": "/images/f577ae005.jpg",
"alt": " ",
"viewLarger": null,
"styleCodes": "__CR0,0,300,300_PT0_SX300_V1__",
"src": "test.jpg",
"originalSrc": null
}, {
"id": "image3",
"assetId": "/images/-0df21c5216d0.jpg",
"alt": " ",
"viewLarger": null,
"styleCodes": "__CR0,0,300,300_PT0_SX300_V1__",
"src": "image.jpg",
"originalSrc": null
}, {
"id": "image4",
"assetId": "/images/78d26b9c-408c-4299-8ea8-e9257f170320.jpg",
"alt": " ",
"viewLarger": null,
"styleCodes": "__CR0,0,300,300_PT0_SX300_V1__",
"src": "image.jpg",
"originalSrc": null
}, {
"id": "thumb1",
"assetId": "/images/-bbbfc9873e31.jpg",
"alt": " ",
"viewLarger": null,
"styleCodes": "__CR0,0,300,300_PT0_SX300_V1__",
"src": "image.jpg",
"originalSrc": null
}, {
"id": "thumb2",
"assetId": "/images/e56f577ae005.jpg",
"alt": " ",
"viewLarger": null,
"styleCodes": "__CR0,0,300,300_PT0_SX300_V1__",
"src": "image_.jpg",
"originalSrc": null
}, {
"id": "thumb3",
"assetId": "/images/0df21c5216d0.jpg",
"alt": " ",
"viewLarger": null,
"styleCodes": "__CR0,0,300,300_PT0_SX300_V1__",
"src": "image_.jpg",
"originalSrc": null
}, {
"id": "thumb4",
"assetId": "/images/-e9257f170320.jpg",
"alt": " ",
"viewLarger": null,
"styleCodes": "__CR0,0,300,300_PT0_SX300_V1__",
"src": "image.jpg",
"originalSrc": null
}],
"integer": null,
"chart": null,
"list": null,
"video": null,
"gallery": null,
"composite": null,
"collection": null,
"product": null
}],
"renderType": "VERTICAL"
},
"locale_data": {
"locale": "en_US",
"identified_by": "MACHINE_DETECT"
}
}],
"badges": []
}
我可以将标头作为JSON路径成功地将JSON扁平化为一个大数据框架.但我想将JSON标准化为单独的表.例如,模块"表应具有 ID
, Text
, Header_ID
, Header_Value
等列.Image表应具有 Image_ID
, Assest_ID
, Src
等列.谁能帮我将此JSON标准化为4个表格.
I can successfully flatten the JSON into one big data frame with the header as the JSON path. but I want to normalize JSON into separate tables. Like for example the Module table should have columns like ID
, Text
, Header_ID
, Header_Value
and so on. The Image table should have columns like Image_ID
, Assest_ID
, Src
and so on. Can anyone please help me normalize this JSON into the 4 tables.
推荐答案
您可以使用 https://towardsdatascience.com/flattening-json-objects-in-python-f5343c794b10 ,如下所示,然后使用 json_normalize
:
You could use the function defined by https://towardsdatascience.com/flattening-json-objects-in-python-f5343c794b10, as follows, and then use json_normalize
:
import pandas as pd
import json
with open('test.json') as json_file:
data = json.load(json_file)
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
module = flatten_json(data["content"][0])
module = pd.json_normalize(module)
然后,您要做的是根据您描述的四个类别选择列.输出为:
Then, what you have to do is select the columns according to the four categories you described. The output is:
ID_0 content_revision ... locale_data_locale locale_data_identified_by
0 B01 1580225050941 ... en_US MACHINE_DETECT
然后您选择以下内容,例如为您的模块和图像DataFrames:
Then you select as follows, for instance for your module and image DataFrames:
module = df.loc[:,df.columns.str.contains("module")]
image = df.loc[:,df.columns.str.contains("image")]
例如,您获得模块的结果是:
The result you get for module for instance is :
template_module_0_id ... template_module_1_product
0 module-11 ... None
然后,我给出了转换DataFrame模块的示例,您只有两个模块,因此您可以在重命名列后执行 concat
:
Then, I give the example for the transformation of the module DataFrame, you only have two modules so you can do a concat
after renaming the columns:
module1 = module.loc[:,module.columns.str.contains("module_0")]
module1.columns = module1.columns.str.replace("_0","")
module2 = module.loc[:,module.columns.str.contains("module_1")]
module2.columns = module2.columns.str.replace("_1","")
modules = pd.concat([module1,
module2])
您将得到:
template_module_id ... template_module_image_7_originalSrc
0 module-11 ... NaN
0 module-6 ... None
如果元素更多,另一个选择是直接在所需的嵌套元素上使用 flatten_json
和 json_normalize
函数.
The other option if you had a lot more elements would be to use the flatten_json
and json_normalize
functions directly on the nested element you want.
这篇关于标准化复杂的嵌套JSON文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!