使用和聚合值解析CSV文件，多列 [英] Parse CSV file with and aggregate values, multiple columns

查看：173 发布时间：2017/2/24 23:23:00 python csv dictionary aggregate

本文介绍了使用和聚合值解析CSV文件，多列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想在此处调整此信息（解析CSV文件并汇总值）将多个列而不是一个相加。

I would like to adapt the post here (Parse CSV file and aggregate the values) to sum multiple columns instead of just one.

对于这些数据：

CITY,AMOUNT,AMOUNT2,AMOUNTn
London,20,21,22
Tokyo,45,46,47
London,55,56,57
New York,25,26,27

我如何获得：

CITY,AMOUNT,AMOUNT2,AMOUNTn
London,75,77,79
Tokyo,45,46,47
New York,25,26,27

我最终会有几千列，不幸的是我不能使用pandas包来完成这个任务。这里是代码我只是将所有三个AMOUNT cols合并成一个，这不是我之后

I will have several thousand columns eventually, and unfortunately I can not use the pandas package for this task. Here is the code I have just aggregates all three AMOUNT cols into one, which is not what I am after

from __future__ import division
import csv
from collections import defaultdict

def default_factory():
    return [0, None, None, 0]

reader = csv.DictReader(open('test_in.txt'))
cities = defaultdict(default_factory)
for row in reader:
    headers = [r for r in row.keys()]
    headers.remove('CITY')
    for i in headers:
        amount = int(row[i])
        cities[row["CITY"]][0] += amount
        max = cities[row["CITY"]][1]
        cities[row["CITY"]][1] = amount if max is None else amount if amount > max else max
        min = cities[row["CITY"]][2]
        cities[row["CITY"]][2] = amount if min is None else amount if amount < min else min
        cities[row["CITY"]][3] += 1


for city in cities:
    cities[city][3] = cities[city][0]/cities[city][3] # calculate mean

with open('test_out.txt', 'wb') as myfile:
    writer = csv.writer(myfile, delimiter="\t")
    writer.writerow(["CITY", "AMOUNT", "AMOUNT2", "AMOUNTn ,"max", "min", "mean"])
    writer.writerows([city] + cities[city] for city in cities)

非常感谢您的帮助。

推荐答案

这里是使用 itertools.groupby 。

import StringIO
import csv
import itertools

data = """CITY,AMOUNT,AMOUNT2,AMOUNTn
London,20,21,22
Tokyo,45,46,47
London,55,56,57
New York,25,26,27"""

# I use StringIO to create a file like object for demo purposes
f = StringIO.StringIO(data) 
fieldnames = f.readline().strip().split(',')
key = lambda x: x[0] # the first column will be a grouping key
# rows must be sorted by city before passing to itertools.groupby
rows_sorted = sorted(csv.reader(f), key=key)
outfile = StringIO.StringIO('')
writer = csv.DictWriter(outfile, fieldnames=fieldnames, lineterminator='\n')
writer.writeheader()
for city, rows in itertools.groupby(rows_sorted, key=key):
    # remove city column for aggregation, convert to ints
    rows = [[int(x) for x in row[1:]] for row in rows] 
    agg = [sum(column) for column in zip(*rows)]
    writer.writerow(dict(zip(fieldnames, [city] + agg)))

print outfile.getvalue()

# CITY,AMOUNT,AMOUNT2,AMOUNTn
# London,75,77,79
# New York,25,26,27
# Tokyo,45,46,47

这篇关于使用和聚合值解析CSV文件，多列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用和聚合值解析CSV文件，多列 [英] Parse CSV file with and aggregate values, multiple columns

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用和聚合值解析CSV文件，多列 [英] Parse CSV file with and aggregate values, multiple columns

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭