根据Python中CSV的标准映射发生次数 [英] Mapping occurrence count based on criterion from CSV in Python

查看:131
本文介绍了根据Python中CSV的标准映射发生次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含许多列的CSV,我只关心两列,它们是文本字段(受影响的环境)"和文本字段(评分)".

I have a CSV with numerous columns, there's only two columns I'm concerned with, they are 'Text field (Environment/s Affected)' and 'Text field (Rating)'.

环境"列具有诸如dev,test,prod之类的条目.评分列中包含P1,P2,P3,P4,P5之类的条目.

The environment column has entries like dev, test, prod. The rating column has entries like P1, P2, P3, P4, P5.

我需要以某种方式绘制出每个环境发生了多少次事件.用Python做到这一点的最佳方法是什么?

I need to somehow map out how many occurrences each of the environments has had. What would be the best way to do this in Python?

最终目标将是这样的: 测试中的P1/P2:15 测试总数:30 分期中的P1/P2:24 测试总数:30

The end goal would be something like this: P1/P2 in Test: 15 Total in Test: 30 P1/P2 in Staging: 24 Total in Test: 30

P1/P2将是这些值的总和,Total将是其他值(即P3,P4,P5)的总和

P1/P2 would be an aggregate of those, Total would be an aggregate of the others, i.e. P3, P4, P5

推荐答案

您已经用pandas标记了您的问题,所以我假设您的数据已经以DataFrame的形式出现.如果是这样,则应执行以下命令:

You have tagged your question with pandas, so I assume your data is already in the form of a DataFrame. If so, the following command should do:

df.groupby(['env', (df['rating'].isin(['P1', 'P2']))]).size().rename(index={True: 'P1/P2', False: 'Total'}, level=1)

(这假设您的DataFrame被命名为df,并且受影响的环境"和评级"列分别被命名为envrating.)

(This assumes that your DataFrame is named df and that your "Environment/s Affected" and "Rating" columns are named env and rating respectively.)

这将对env列的第一个唯一值,然后对rating列的唯一值进行分组,具体取决于其中包含的值是"P1"还是"P2".然后,它计算每个子组中的行数.

This performs a grouping across first unique values of the env column, and then the rating column, depending on whether the value contained in it is either 'P1' or 'P2', or not. It then counts the number of rows within each subgroup.

如果您的数据还不是DataFrame格式,则需要从CSV中将其作为一个数据加载,这可以通过以下命令完成:

If your data is not yet in the form of a DataFrame, you will need to load it as one from a CSV, which can be done with the following command:

df = pd.read_csv(file_path)

您可能需要稍微调整参数,具体取决于文件的格式;可以在此处找到.

You may need to tweak the arguments a little, depending on the format of your file; the document can be found here.

这篇关于根据Python中CSV的标准映射发生次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆