对某些列进行分组并汇总CSV中的另一列 [英] Group certain columns and summing up another column from a CSV
问题描述
我的csv中有数据需要解析.看起来像:
I have data in a csv that needs to be parsed. It looks like:
Date,Tag,Amount
13/06/2018,ABC,6750000
13/06/2018,ABC,159800
24/05/2018,ABC,-1848920
16/05/2018,AB,-1829700
16/05/2018,AB,3600000
28/06/2018,A,15938000
16/05/2018,AB,3748998
28/06/2018,A,1035000
28/06/2018,A,1035000
14/06/2018,ABC,2122717
您可以看到每个日期旁边都有一个标签和数字. 我要实现的目标是确定日期并标记键,并按日期和标记进行分组并总结金额.
You can see each date has a tag and number next to it. what i am trying to achieve is to make the date and tag the key and group by the date and tag and to sum up the amount.
预期结果
Date,Tag,Amount
13/06/2018,ABC,5220680
16/05/2018,AB,5519298
28/06/2018,A,18008000
14/06/2018,ABC,2122717
我现在正在使用的代码在下面,无法正常工作.
the code i am using now is below which is not working.
from collections import defaultdict
import csv
d = defaultdict(int)
with open("file.csv") as f:
for line in f:
tokens = [t.strip() for t in line.split(",")]
try:
date = int(tokens[0])
tag = int(tokens[1])
amount = int(tokens[2])
except ValueError:
continue
d[date] += amount
print d
有人可以告诉我如何不用熊猫来避免这种情况吗
could someone show me how to aheive this please without using pandas
推荐答案
您绝对应该使用 pandas
.除了您必须自己编写代码之外,您只需安装pandas模块,然后导入它(import pandas as pd
),即可使用2条简单直观的代码行来解决此问题
You should definitely use pandas
. With the exception that you have to code this by yourself, you can just install the pandas module, import it (import pandas as pd
) as solve this problem with 2 simple and intuitive lines of code
>>> df = pd.read_csv('file.csv')
>>> df.groupby(['Date', 'Tag']).Amount.sum()
Date Tag
13/06/2018 ABC 6909800
14/06/2018 ABC 2122717
16/05/2018 AB 5519298
24/05/2018 ABC -1848920
28/06/2018 A 18008000
如果您真的需要自己编写代码,则可以使用嵌套的defaultdict
,这样就可以具有两层groupby.另外,为什么要尝试将int
,date
和tag
强制转换为?完全没有道理.只需将其删除.
If you really need to code this yourself, you can use a nested defaultdict
so you can have two layers of groupby. Also, why you try to cast to int
the date
and the tag
? Makes no sense at all. Just remove it.
d = defaultdict(lambda: defaultdict(int))
for line in z:
tokens = [t.strip() for t in line.split(",")]
try:
date = tokens[0]
tag = tokens[1]
amount = int(tokens[2])
except ValueError as e:
continue
d[date][tag] += amount
输出为:
13/06/2018 ABC 6909800
24/05/2018 ABC -1848920
16/05/2018 AB 5519298
28/06/2018 A 18008000
14/06/2018 ABC 2122717
要输出上面的结果,只需遍历以下各项:
To output the result above, just iterate through the items:
for k,v in d.items():
for k2, v2 in v.items():
print(k,k2,v2)
要使您的代码更好,请仅阅读第一行,然后从第二行进行迭代直到最后.这样,您的try/except可以删除,您将获得一个更简单,更简洁的代码.但是你可以从这里接机,对吗? ;)
To make your code even better, read the first line only, and then iterate from the second line til the end. That way, your try/except can be removed and you'd get a simpler and cleaner code. But you can pick up from here, right? ;)
要写入csv,只需
s = '\n'.join(['{0} {1} {2}'.format(k, k2, v2) for k,v in d.items() for k2,v2 in v.items()])
with open('output.txt', 'w') as f:
f.write(s)
这篇关于对某些列进行分组并汇总CSV中的另一列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!