使用 Pandas 从嵌套的 json 转换为 csv [英] Conversion from nested json to csv with pandas

查看:80
本文介绍了使用 Pandas 从嵌套的 json 转换为 csv的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将嵌套的 json 转换为 csv 文件,但我正在努力解决文件结构所需的逻辑:它是一个包含 2 个对象的 json,我只想将其中一个转换为 csv,这是一个带有嵌套的列表.

I am trying to convert a nested json into a csv file, but I am struggling with the logic needed for the structure of my file: it's a json with 2 objects and I would like to convert into csv only one of them, which is a list with nesting.

我在这篇博文中发现了非常有用的扁平化"json信息.我已经基本上适应了我的问题,但它仍然不适合我.

I've found very helpful "flattening" json info in this blog post. I have been basically adapting it to my problem, but it is still not working for me.

我的 json 文件如下所示:

My json file looks like this:

{
  "tickets":[
    {
      "Name": "Liam",
      "Location": {
        "City": "Los Angeles",
        "State": "CA"
      },
      "hobbies": [
        "Piano",
        "Sports"
      ],
      "year" : 1985,
      "teamId" : "ATL",
      "playerId" : "barkele01",
      "salary" : 870000
    },
    {
      "Name": "John",
      "Location": {
        "City": "Los Angeles",
        "State": "CA"
      },
      "hobbies": [
        "Music",
        "Running"
      ],
      "year" : 1985,
      "teamId" : "ATL",
      "playerId" : "bedrost01",
      "salary" : 550000
    }
  ],
  "count": 2
}

到目前为止,我的代码如下所示:

my code, so far, looks like this:

import json
from pandas.io.json import json_normalize
import argparse


def flatten_json(y):
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            out[name[:-1]] = x
    flatten(y)
    return out


if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Converting json files into csv for Tableau processing')
    parser.add_argument(
        "-j", "--json", dest="json_file", help="PATH/TO/json file to convert", metavar="FILE", required=True)

    args = parser.parse_args()

    with open(args.json_file, "r") as inputFile:  # open json file
        json_data = json.loads(inputFile.read())  # load json content
    flat_json = flatten_json(json_data)
    # normalizing flat json
    final_data = json_normalize(flat_json)

    with open(args.json_file.replace(".json", ".csv"), "w") as outputFile:  # open csv file

        # saving DataFrame to csv
        final_data.to_csv(outputFile, encoding='utf8', index=False)

我想获得的是 csv 中每张票的 1 行,带有标题:

What I would like to obtain is 1 line per ticket in the csv, with headings:

姓名,Location_City,Location_State,Hobbies_0,Hobbies_1,Year,TeamId,PlayerId,Salary.

我真的很感激任何可以点击的东西!谢谢!

I would really appreciate anything that can do the click! Thank you!

推荐答案

如果你已经有了一个扁平化 Json 对象的函数,你只需要扁平化票证:

An you already have a function to flatten a Json object, you have just to flatten the tickets:

...
with open(args.json_file, "r") as inputFile:  # open json file
    json_data = json.loads(inputFile.read())  # load json content
final_data = pd.DataFrame([flatten_json(elt) for elt in json_data['tickets']])
...

使用您的示例数据,final_data 符合预期:

With your sample data, final_data is as expected:

  Location_City Location_State  Name hobbies_0 hobbies_1   playerId  salary teamId  year
0   Los Angeles             CA  Liam     Piano    Sports  barkele01  870000    ATL  1985
1   Los Angeles             CA  John     Music   Running  bedrost01  550000    ATL  1985

这篇关于使用 Pandas 从嵌套的 json 转换为 csv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆