对某些列进行分组并汇总CSV中的另一列 [英] Group certain columns and summing up another column from a CSV

查看:114
本文介绍了对某些列进行分组并汇总CSV中的另一列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的csv中有数据需要解析.看起来像:

I have data in a csv that needs to be parsed. It looks like:

Date,Tag,Amount
13/06/2018,ABC,6750000
13/06/2018,ABC,159800
24/05/2018,ABC,-1848920
16/05/2018,AB,-1829700
16/05/2018,AB,3600000
28/06/2018,A,15938000
16/05/2018,AB,3748998
28/06/2018,A,1035000
28/06/2018,A,1035000
14/06/2018,ABC,2122717

您可以看到每个日期旁边都有一个标签和数字. 我要实现的目标是确定日期并标记键,并按日期和标记进行分组并总结金额.

You can see each date has a tag and number next to it. what i am trying to achieve is to make the date and tag the key and group by the date and tag and to sum up the amount.

预期结果

Date,Tag,Amount
13/06/2018,ABC,5220680
16/05/2018,AB,5519298
28/06/2018,A,18008000
14/06/2018,ABC,2122717

我现在正在使用的代码在下面,无法正常工作.

the code i am using now is below which is not working.

from collections import defaultdict
import csv

d = defaultdict(int)

with open("file.csv") as f:
    for line in f:
        tokens = [t.strip() for t in line.split(",")]
        try:
            date = int(tokens[0])
            tag = int(tokens[1])
            amount = int(tokens[2])
        except ValueError:
            continue
        d[date] += amount

print d

有人可以告诉我如何不用熊猫来避免这种情况吗

could someone show me how to aheive this please without using pandas

推荐答案

您绝对应该使用 pandas .除了您必须自己编写代码之外,您只需安装pandas模块,然后导入它(import pandas as pd),即可使用2条简单直观的代码行来解决此问题

You should definitely use pandas. With the exception that you have to code this by yourself, you can just install the pandas module, import it (import pandas as pd) as solve this problem with 2 simple and intuitive lines of code

>>> df = pd.read_csv('file.csv')
>>> df.groupby(['Date', 'Tag']).Amount.sum()

Date        Tag
13/06/2018  ABC     6909800
14/06/2018  ABC     2122717
16/05/2018  AB      5519298
24/05/2018  ABC    -1848920
28/06/2018  A      18008000


如果您真的需要自己编写代码,则可以使用嵌套的defaultdict,这样就可以具有两层groupby.另外,为什么要尝试将intdatetag强制转换为?完全没有道理.只需将其删除.


If you really need to code this yourself, you can use a nested defaultdict so you can have two layers of groupby. Also, why you try to cast to int the date and the tag? Makes no sense at all. Just remove it.

d = defaultdict(lambda: defaultdict(int))

for line in z:
    tokens = [t.strip() for t in line.split(",")]
    try:
        date = tokens[0]
        tag = tokens[1]
        amount = int(tokens[2])
    except ValueError as e:
        continue
    d[date][tag] += amount

输出为:

13/06/2018 ABC 6909800
24/05/2018 ABC -1848920
16/05/2018 AB 5519298
28/06/2018 A 18008000
14/06/2018 ABC 2122717

要输出上面的结果,只需遍历以下各项:

To output the result above, just iterate through the items:

for k,v in d.items():
    for k2, v2 in v.items():
        print(k,k2,v2)

要使您的代码更好,请仅阅读第一行,然后从第二行进行迭代直到最后.这样,您的try/except可以删除,您将获得一个更简单,更简洁的代码.但是你可以从这里接机,对吗? ;)

To make your code even better, read the first line only, and then iterate from the second line til the end. That way, your try/except can be removed and you'd get a simpler and cleaner code. But you can pick up from here, right? ;)

要写入csv,只需

s = '\n'.join(['{0} {1} {2}'.format(k, k2, v2) for k,v in d.items() for k2,v2 in v.items()])
with open('output.txt', 'w') as f:
    f.write(s)

这篇关于对某些列进行分组并汇总CSV中的另一列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆