使用 Pandas 从嵌套的 json 转换为 csv [英] Conversion from nested json to csv with pandas
问题描述
我正在尝试将嵌套的 json 转换为 csv 文件,但我正在努力解决文件结构所需的逻辑:它是一个包含 2 个对象的 json,我只想将其中一个转换为 csv,这是一个带有嵌套的列表.
I am trying to convert a nested json into a csv file, but I am struggling with the logic needed for the structure of my file: it's a json with 2 objects and I would like to convert into csv only one of them, which is a list with nesting.
我在这篇博文中发现了非常有用的扁平化"json信息.我已经基本上适应了我的问题,但它仍然不适合我.
I've found very helpful "flattening" json info in this blog post. I have been basically adapting it to my problem, but it is still not working for me.
我的 json 文件如下所示:
My json file looks like this:
{
"tickets":[
{
"Name": "Liam",
"Location": {
"City": "Los Angeles",
"State": "CA"
},
"hobbies": [
"Piano",
"Sports"
],
"year" : 1985,
"teamId" : "ATL",
"playerId" : "barkele01",
"salary" : 870000
},
{
"Name": "John",
"Location": {
"City": "Los Angeles",
"State": "CA"
},
"hobbies": [
"Music",
"Running"
],
"year" : 1985,
"teamId" : "ATL",
"playerId" : "bedrost01",
"salary" : 550000
}
],
"count": 2
}
到目前为止,我的代码如下所示:
my code, so far, looks like this:
import json
from pandas.io.json import json_normalize
import argparse
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Converting json files into csv for Tableau processing')
parser.add_argument(
"-j", "--json", dest="json_file", help="PATH/TO/json file to convert", metavar="FILE", required=True)
args = parser.parse_args()
with open(args.json_file, "r") as inputFile: # open json file
json_data = json.loads(inputFile.read()) # load json content
flat_json = flatten_json(json_data)
# normalizing flat json
final_data = json_normalize(flat_json)
with open(args.json_file.replace(".json", ".csv"), "w") as outputFile: # open csv file
# saving DataFrame to csv
final_data.to_csv(outputFile, encoding='utf8', index=False)
我想获得的是 csv 中每张票的 1 行,带有标题:
What I would like to obtain is 1 line per ticket in the csv, with headings:
姓名,Location_City,Location_State,Hobbies_0,Hobbies_1,Year,TeamId,PlayerId,Salary
.
我真的很感激任何可以点击的东西!谢谢!
I would really appreciate anything that can do the click! Thank you!
推荐答案
如果你已经有了一个扁平化 Json 对象的函数,你只需要扁平化票证:
An you already have a function to flatten a Json object, you have just to flatten the tickets:
...
with open(args.json_file, "r") as inputFile: # open json file
json_data = json.loads(inputFile.read()) # load json content
final_data = pd.DataFrame([flatten_json(elt) for elt in json_data['tickets']])
...
使用您的示例数据,final_data
符合预期:
With your sample data, final_data
is as expected:
Location_City Location_State Name hobbies_0 hobbies_1 playerId salary teamId year
0 Los Angeles CA Liam Piano Sports barkele01 870000 ATL 1985
1 Los Angeles CA John Music Running bedrost01 550000 ATL 1985
这篇关于使用 Pandas 从嵌套的 json 转换为 csv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!