将json嵌套到csv-通用方法 [英] Nested json to csv - generic approach
问题描述
我是Python的新手,我正在努力将嵌套的json
文件转换为cvs
.为此,我首先加载了json
,然后对其进行了转换,并使用json_normalize打印出了不错的输出,然后使用pandas包将标准化的部分输出到了cvs
中.
I am very new to Python and I am struggling with converting nested json
file into cvs
. To do so I started with loading the json
and then transformed it in a way that prints out nice output with json_normalize, then using pandas package I output the normalised parts into cvs
.
我的示例json:
[{
"_id": {
"id": "123"
},
"device": {
"browser": "Safari",
"category": "d",
"os": "Mac"
},
"exID": {
"$oid": "123"
},
"extreme": false,
"geo": {
"city": "London",
"country": "United Kingdom",
"countryCode": "UK",
"ip": "00.000.000.0"
},
"viewed": {
"$date": "2011-02-12"
},
"attributes": [{
"name": "gender",
"numeric": 0,
"value": 0
}, {
"name": "email",
"value": false
}],
"change": [{
"id": {
"$id": "1231"
},
"seen": [{
"$date": "2011-02-12"
}]
}]
}, {
"_id": {
"id": "456"
},
"device": {
"browser": "Chrome 47",
"category": "d",
"os": "Windows"
},
"exID": {
"$oid": "345"
},
"extreme": false,
"geo": {
"city": "Berlin",
"country": "Germany",
"countryCode": "DE",
"ip": "00.000.000.0"
},
"viewed": {
"$date": "2011-05-12"
},
"attributes": [{
"name": "gender",
"numeric": 1,
"value": 1
}, {
"name": "email",
"value": true
}],
"change": [{
"id": {
"$id": "1231"
},
"seen": [{
"$date": "2011-02-12"
}]
}]
}]
使用以下代码(此处不包括嵌套部分):
With following code (here I exclude the nested parts):
import json
from pandas.io.json import json_normalize
def loading_file():
#File path
file_path = #file path here
#Loading json file
json_data = open(file_path)
data = json.load(json_data)
return data
#Storing avaliable keys
def data_keys(data):
keys = {}
for i in data:
for k in i.keys():
keys[k] = 1
keys = keys.keys()
#Excluding nested arrays from keys - hard coded -> IMPROVE
new_keys = [x for x in keys if
x != 'attributes' and
x != 'change']
return new_keys
#Excluding nested arrays from json dictionary
def new_data(data, keys):
new_data = []
for i in range(0, len(data)):
x = {k:v for (k,v) in data[i].items() if k in keys }
new_data.append(x)
return new_data
def csv_out(data):
data.to_csv('out.csv',encoding='utf-8')
def main():
data_file = loading_file()
keys = data_keys(data_file)
table = new_data(data_file, keys)
csv_out(json_normalize(table))
main()
我当前的输出看起来像这样:
My current output looks something like this:
| _id.id | device.browser | device.category | device.os | ... | viewed.$date |
|--------|----------------|-----------------|-----------|------|--------------|
| 123 | Safari | d | Mac | ... | 2011-02-12 |
| 456 | Chrome 47 | d | Windows | ... | 2011-05-12 |
| | | | | | |
我的问题是我想将嵌套数组包括到cvs中,所以我必须将它们展平.我无法弄清楚如何使其通用,因此在创建表时不要使用字典keys
(numeric, id, name
)和values
.我必须使其具有通用性,因为attributes
和change
中的键数.因此,我想要这样的输出:
My problem is that I would like to include the nested arrays into the cvs, so I have to flatten them. I cannot figure out how to make it generic so I do not use dictionary keys
(numeric, id, name
) and values
while creating table. I have to make it generalisable because the number of keys in attributes
and change
. Therefore, I would like to have output like this:
| _id.id | device.browser | ... | attributes_gender_numeric | attributes_gender_value | attributes_email_value | change_id | change_seen |
|--------|----------------|-----|---------------------------|-------------------------|------------------------|-----------|-------------|
| 123 | Safari | ... | 0 | 0 | false | 1231 | 2011-02-12 |
| 456 | Chrome 47 | ... | 1 | 1 | true | 1231 | 2011-02-12 |
| | | | | | | | |
先谢谢您!非常欢迎任何有关改进我的代码并使之更高效的提示.
Thank you in advance! Any tips how to improve my code and make it more efficient are very welcome.
推荐答案
Thanks to the great blog post by Amir Ziai which you can find here I managed to output my data in form of a flat table. With the following function:
#Function that recursively extracts values out of the object into a flattened dictionary
def flatten_json(data):
flat = [] #list of flat dictionaries
def flatten(y):
out = {}
def flatten2(x, name=''):
if type(x) is dict:
for a in x:
if a == "name":
flatten2(x["value"], name + x[a] + '_')
else:
flatten2(x[a], name + a + '_')
elif type(x) is list:
for a in x:
flatten2(a, name + '_')
else:
out[name[:-1]] = x
flatten2(y)
return out
#Loop needed to flatten multiple objects
for i in range(len(data)):
flat.append(flatten(data[i]).copy())
return json_normalize(flat)
我知道由于名称-值if语句,它不能完全泛化的事实.但是,如果删除了创建名称-值字典的豁免,则该代码可以与其他嵌入式数组一起使用.
I am aware of the fact that it is not perfectly generalisable, due to name-value if statement. However, if this exemption for creating the name-value dictionaries is deleted, the code can be used with other embedded arrays.
这篇关于将json嵌套到csv-通用方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!