csv到json与需要分组的列数据 [英] csv to json with column data that needs to be grouped

查看:114
本文介绍了csv到json与需要分组的列数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个与此格式相似的CSV文件

I have a CSV file in a format similar to this

order_id, customer_name, item_1_id, item_1_quantity, Item_2_id, Item_2_quantity, Item_3_id, Item_3_quantity
1,        John,          4,         1,               24,        4,               16,        1
2,        Paul,          8,         3,               41,        1,               33,        1
3,        Andrew,        1,         1,               34,        4,               8,          2

我想导出到json,目前正在这样做.

I want to export to json, currently I am doing this.

df = pd.read_csv('simple.csv')
print ( df.to_json(orient = 'records') )

输出为

[
    {
        "Item_2_id": 24,
        "Item_2_quantity": 4,
        "Item_3_id": 16,
        "Item_3_quantity": 1,
        "customer_name": "John",
        "item_1_id": 4,
        "item_1_quantity": 1,
        "order_id": 1
    },
......

但是,我希望输出是

[
    {
        "customer_name": "John",
        "order_id": 1,
        "items": [
            { "id": 4, "quantity": 1 },
            { "id": 24, "quantity": 4 },
            { "id": 16, "quantity": 1 },
         ]
    },
......

有什么好的方法建议吗?

Any suggestions on a good way to do this?

在此特定项目中,每个订单的重复次数不得超过5次

In this particular project, there will not be more than 5 times per order

推荐答案

尝试以下操作:

import pandas as pd
import json

output_lst = []

##specify the first row as header
df = pd.read_csv('simple.csv', header=0)
##iterate through all the rows
for index, row in df.iterrows():
    dict = {}
    items_lst = []
    ## column_list is a list of column headers
    column_list = df.columns.values
    for i, col_name in enumerate(column_list):
        ## for the first 2 columns simply copy the value into the dictionary
        if i<2:
            element = row[col_name]
            if isinstance(element, str):
            ## strip if it is a string type value
                element = element.strip()
            dict[col_name] = element

        elif "_id" in col_name:
            ## i+1 is used assuming that the item_quantity comes right after  the corresponding item_id for each item
            item_dict  = {"id":row[col_name], "quantity":row[column_list[i+1]]}
            items_lst.append(item_dict)

    dict["items"] = items_lst
    output_lst.append(dict)

print json.dumps(output_lst)

如果使用问题中描述的sample.csv运行上述文件,则会得到以下输出:

If you run the above file with the sample.csv described in the question then you get the following output:

[
    {
        "order_id": 1,
        "items": [
            {
                "id": 4,
                "quantity": 1
            },
            {
                "id": 24,
                "quantity": 4
            },
            {
                "id": 16,
                "quantity": 1
            }
        ],
        " customer_name": "John"
    },
    {
        "order_id": 2,
        "items": [
            {
                "id": 8,
                "quantity": 3
            },
            {
                "id": 41,
                "quantity": 1
            },
            {
                "id": 33,
                "quantity": 1
            }
        ],
        " customer_name": "Paul"
    },
    {
        "order_id": 3,
        "items": [
            {
                "id": 1,
                "quantity": 1
            },
            {
                "id": 34,
                "quantity": 4
            },
            {
                "id": 8,
                "quantity": 2
            }
        ],
        " customer_name": "Andrew"
    }
]

这篇关于csv到json与需要分组的列数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆