pandas 重复属性的总和 [英] Pandas Sum of Duplicate Attributes

查看:54
本文介绍了 pandas 重复属性的总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Pandas 来操作一个包含多行和多列的 csv 文件,如下所示

I'm using Pandas to manipulate a csv file with several rows and columns that looks like the following

Fullname     Amount     Date           Zip    State .....
John Joe        1        1/10/1900     55555    Confusion
Betty White     5         .             .       Alaska 
Bruce Wayne     10        .             .       Frustration
John Joe        20        .             .       .
Betty White     25        .             .       .

我想创建一个名为 Total 的新列,其中包含每个人的总金额.(由 FullnameZip 标识).我很难找到正确的解决方案.

I'd like to create a new column entitled Total with a total sum of amount for each person. (Identified by Fullname and Zip). I'm having difficulty in finding the correct solution.

让我们调用我的 csv 导入 csvfile.这是我所拥有的.

Let's just call my csv import csvfile. Here is what I have.

import Pandas
df = pandas.read_csv('csvfile.csv', header = 0) 
df.sort(['fullname'])

我想我必须使用 iterrows 来做我想做的事情.删除重复的问题是我会丢失数量或数量可能不同.

I think I have to use the iterrows to do what I want as an object. The problem with dropping duplicates is that I will lose the amount or the amount may be different.

推荐答案

我想你想要这个:

df['Total'] = df.groupby(['Fullname', 'Zip'])['Amount'].transform('sum')

所以 groupby 将按 Fullnamezip 列分组,如您所说,然后我们调用 transformAmount 列并通过传入字符串 sum 计算总金额,这将返回一个索引与原始 df 对齐的系列,然后您可以删除重复项.例如

So groupby will group by the Fullname and zip columns, as you've stated, we then call transform on the Amount column and calculate the total amount by passing in the string sum, this will return a series with the index aligned to the original df, you can then drop the duplicates afterwards. e.g.

new_df = df.drop_duplicates(subset=['Fullname', 'Zip'])

这篇关于 pandas 重复属性的总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆