使用Python将CSV数据分组 [英] Using Python to group csv data

查看:313
本文介绍了使用Python将CSV数据分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个csv文件,其中包含成千上万个条目,这些条目需要分成几组。在下面的示例中,我需要根据河流名称将每一行分成几组,以便以后可以根据它们的组重新格式化信息。

I have a csv file with thousands of entries that need to be broken up into groups. In the example below, I need each row broken up into groups based on the River Name so later I can reformat the information based on their groups.

River Name, Branch, Length
Catnip, 1, 2145.30
Peterson, 2, 24.5
Catnip, 3, 15.4
Fergerson, 1, 5.2
Catnip, 1, 88.56
Peterson, 2, 6.45

我能想到的唯一方法将该信息分组:

The only way I can think of grouping the information would be to:


  1. 使用python读取csv并创建仅包含唯一河流名称的列表。

  2. 根据独特的河流名称创建新的单独的csv,例如Peterson.csv,
    Catnip.csv。

  3. 使用python读取原始的csv,并根据在要读取的行的河流名称上,将该行写入相应的.csv文件。例如,行Catnip,1,2145.30将被写入catnip.csv

我不认为这是一种传出的方式这是因为它给了我大约1500个csv,需要将其打开并写入其中,但我对python的了解有限。如果有人可以提供更好的方法,将不胜感激。

I don't think this is an efferent way to go about this as it gives me about 1500 csv that will need to be open and written to, but I am at my limits of python knowledge. If any one could provide a better methodology, it would greatly be appreciated.

推荐答案

您还可以简单地使用 csv 模块并将结果保存到字典中。我列举了读者跳过第一行(我敢肯定必须有一种更简单的方法...)。然后,我读取每一行,并将值分配给分支长度。如果河流不在词典中,则将其初始化为空列表。然后,将分支 length 的元组对附加到字典中。

You can also simply use the csv module and save the results to a dictionary. I enumerated the reader to skip the first row (I'm sure there must be an easier way...). I then read each row and assign the values to river, branch and length. If the river is not in the dictionary, then it initializes it with an empty list. It then appends the tuple pair of branch and length to the dictionary.

rivers = {}
with open('rivers.csv', mode='rU') as f:
    reader = csv.reader(f, delimiter=',')  # dialect=csv.excel_tab?
    for n, row in enumerate(reader):
        if not n:
            # Skip header row (n = 0).
            continue  
        river, branch, length = row
        if river not in rivers:
            rivers[river] = list()
        rivers[river].append((branch, length))

>>> rivers
{'Catnip': [('1', '2145.3'), ('3', '15.4'), ('1', '88.56')],
 'Fergerson': [('1', '5.2')],
 'Peterson': [('2', '24.5'), ('2', '6.45')]}

这篇关于使用Python将CSV数据分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆