解析csv文件并在python中聚合值 [英] Parsing a csv file and aggregate values in python

查看:88
本文介绍了解析csv文件并在python中聚合值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想解析一个csv文件并聚合2列.

I'm looking to parse a csv file and aggregate 2 columns.

csv文件中的数据:

Data in csv file:

'IP Address', Severity
10.0.0.1, High
10.0.0.1, High
10.0.0.1, Low
10.0.0.1, Medium
10.0.0.2, Medium
10.0.0.2, High
10.0.0.2, Low
10.0.0.3, Medium
10.0.0.3, High
10.0.0.3, Medium

我正在寻求获得以下方面的输出:

I'm looking to obtain output along the lines of:

'IP Address', Severity
10.0.0.1, High:2, Medium:1, Low:1
10.0.0.2, High:1, Medium:1, Low:1
10.0.0.3, High:1, Medium:2, Low:0

或(理想情况下)

'IP Address', High, Medium, Low
10.0.0.1, 2, 1, 1
10.0.0.2, 1, 1, 1
10.0.0.3, 1, 2, 0

我最接近的是这里: 解析CSV文件并汇总值

The closest I have come is here: Parse CSV file and aggregate the values

我似乎无法汇总字符串(严重性)变量.

I can't seem to aggregate on string (Severity) variable.

如何输出这些数据?

感谢您的帮助.

推荐答案

这是我的解决方案,ag.py:

Here is my solution, ag.py:

import collections
import csv
import sys

output = collections.defaultdict(collections.Counter)

with open(sys.argv[1]) as infile:
    reader = csv.reader(infile)
    reader.next() # Skip header line
    for ip,level in reader:
        level = level.strip() # Remove surrounding spaces
        output[ip][level] += 1

print "'IP Address',High,Medium,Low"
for ip, count in output.items():
    print '{0},{1[High]},{1[Medium]},{1[Low]}'.format(ip, count)

要运行该解决方案,请发出以下命令:

To run the solution, issue the following command:

python ag.py data.csv

讨论

  • output是一个字典,其键是IP,值是collections.Counter对象.
  • 每个计数器对象对特定IP计数为高",中"和低"
  • 我的解决方案将打印到标准输出,您可以对其进行修改以打印到文件
  • Discussion

    • output is a dictionary whose keys are the IP, and values are collections.Counter objects.
    • Each counter object counts 'High', 'Medium', and 'Low' for a particular IP
    • My solution prints to the stdout, you can modify it to print to file
    • 这篇关于解析csv文件并在python中聚合值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆