解析CSV文件并汇总值 [英] Parse CSV file and aggregate the values
本文介绍了解析CSV文件并汇总值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想解析一个CSV文件并汇总这些值.城市行中有重复的值(样本):
I'd like to parse a CSV file and aggregate the values. The city row has repeating values (sample):
CITY,AMOUNT
London,20
Tokyo,45
London,55
New York,25
解析后的结果应该类似于:
After parsing the result should be something like:
CITY, AMOUNT
London,75
Tokyo,45
New York,25
我编写了以下代码来提取唯一的城市名称:
I've written the following code to extract the unique city names:
def main():
contrib_data = list(csv.DictReader(open('contributions.csv','rU')))
combined = []
for row in contrib_data:
if row['OFFICE'] not in combined:
combined.append(row['OFFICE'])
然后如何汇总值?
推荐答案
在Python 3.2.2中进行了测试:
Tested in Python 3.2.2:
import csv
from collections import defaultdict
reader = csv.DictReader(open('test.csv', newline=''))
cities = defaultdict(int)
for row in reader:
cities[row["CITY"]] += int(row["AMOUNT"])
writer = csv.writer(open('out.csv', 'w', newline = ''))
writer.writerow(["CITY", "AMOUNT"])
writer.writerows([city, cities[city]] for city in cities)
结果:
CITY,AMOUNT
New York,25
London,75
Tokyo,45
关于您的附加要求:
import csv
from collections import defaultdict
def default_factory():
return [0, None, None, 0]
reader = csv.DictReader(open('test.csv', newline=''))
cities = defaultdict(default_factory)
for row in reader:
amount = int(row["AMOUNT"])
cities[row["CITY"]][0] += amount
max = cities[row["CITY"]][1]
cities[row["CITY"]][1] = amount if max is None else amount if amount > max else max
min = cities[row["CITY"]][2]
cities[row["CITY"]][2] = amount if min is None else amount if amount < min else min
cities[row["CITY"]][3] += 1
for city in cities:
cities[city][3] = cities[city][0]/cities[city][3] # calculate mean
writer = csv.writer(open('out.csv', 'w', newline = ''))
writer.writerow(["CITY", "AMOUNT", "max", "min", "mean"])
writer.writerows([city] + cities[city] for city in cities)
这给你
CITY,AMOUNT,max,min,mean
New York,25,25,25,25.0
London,75,55,20,37.5
Tokyo,45,45,45,45.0
请注意,在Python 2下,您需要在顶部附加一行from __future__ import division
才能获得正确的结果.
Note that under Python 2, you'll need the additional line from __future__ import division
at the top to get correct results.
这篇关于解析CSV文件并汇总值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文