在Python中将多个JSON文件中的信息提取到单个CSV文件中 [英] Extracting information from multiple JSON files to single CSV file in python

查看:115
本文介绍了在Python中将多个JSON文件中的信息提取到单个CSV文件中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含多个词典的JSON文件:

I have a JSON file with multiple dictionaries:

{"team1participants": 
[ {
        "stats": {
            "item1": 3153, 
            "totalScore": 0, 
            ...
        }
   },
   {
        "stats": {
            "item1": 2123, 
            "totalScore": 5, 
            ...
        }
   },
   {
        "stats": {
            "item1": 1253, 
            "totalScore": 1, 
            ...
        }
   }
],
"team2participants": 
[ {
        "stats": {
            "item1": 1853, 
            "totalScore": 2, 
            ...
        }
   },
   {
        "stats": {
            "item1": 21523, 
            "totalScore": 5, 
            ...
        }
   },
   {
        "stats": {
            "item1": 12503, 
            "totalScore": 1, 
            ...
        }
   }
]
}

换句话说,JSON具有多个密钥.每个键都有一个列表,其中包含各个参与者的统计信息.

In other words, the JSON has multiple keys. Each key has a list containing statistics of individual participants.

我有很多这样的JSON文件,我想将其提取到单个CSV文件中.我当然可以手动执行此操作,但这非常繁琐.我知道DictWriter,但它似乎仅适用于单个词典.我也知道字典可以串联,但是会出现问题,因为所有字典都具有相同的键.

I have many such JSON files, and I want to extract it to a single CSV file. I can of course do this manually, but this is very tedious. I know of DictWriter, but it seems to work only for single dictionaries. I also know that dictionaries can be concatenated, but it will be problematic because all dictionaries have the same keys.

如何有效地将其提取到CSV文件中?

How can I efficiently extract this to a CSV file?

推荐答案

您可以使数据整齐,以便每一行都是唯一的观察结果.

You can make your data tidy so that each row is a unique observation.

teams = []
items = []
scores = []
for team in d:
    for item in d[team]:
        teams.append(team)
        items.append(item['stats']['item1'])
        scores.append(item['stats']['totalScore'])


# Using Pandas.
import pandas as pd

df = pd.DataFrame({'team': teams, 'item': items, 'score': scores})
>>> df
    item   score               team
0   1853       2  team2participants
1  21523       5  team2participants
2  12503       1  team2participants
3   3153       0  team1participants
4   2123       5  team1participants
5   1253       1  team1participants

您还可以使用列表理解而不是循环.

You could also use a list comprehension instead of a loop.

results = [[team, item['stats']['item1'], item['stats']['totalScore']] 
           for team in d for item in d[team]]
df = pd.DataFrame(results, columns=['team', 'item', 'score'])

然后您可以创建数据透视表,例如:

You can then do a pivot table, for example:

>>> df.pivot_table(values='score ', index='team ', columns='item', aggfunc='sum').fillna(0)
item               1253   1853   2123   3153   12503  21523
team                                                       
team1participants      1      0      5      0      0      0
team2participants      0      2      0      0      1      5

此外,由于它是一个数据框,因此很容易将其另存为CSV.

Also, now that it is a dataframe, it is easy to save it as a CSV.

df.to_csv(my_file_name.csv)

这篇关于在Python中将多个JSON文件中的信息提取到单个CSV文件中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆