pandas 重复属性的总和 [英] Pandas Sum of Duplicate Attributes
问题描述
我正在使用 Pandas 来操作一个包含多行和多列的 csv 文件,如下所示
I'm using Pandas to manipulate a csv file with several rows and columns that looks like the following
Fullname Amount Date Zip State .....
John Joe 1 1/10/1900 55555 Confusion
Betty White 5 . . Alaska
Bruce Wayne 10 . . Frustration
John Joe 20 . . .
Betty White 25 . . .
我想创建一个名为 Total
的新列,其中包含每个人的总金额.(由 Fullname
和 Zip
标识).我很难找到正确的解决方案.
I'd like to create a new column entitled Total
with a total sum of amount for each person. (Identified by Fullname
and Zip
). I'm having difficulty in finding the correct solution.
让我们调用我的 csv 导入 csvfile.这是我所拥有的.
Let's just call my csv import csvfile. Here is what I have.
import Pandas
df = pandas.read_csv('csvfile.csv', header = 0)
df.sort(['fullname'])
我想我必须使用 iterrows 来做我想做的事情.删除重复的问题是我会丢失数量或数量可能不同.
I think I have to use the iterrows to do what I want as an object. The problem with dropping duplicates is that I will lose the amount or the amount may be different.
推荐答案
我想你想要这个:
df['Total'] = df.groupby(['Fullname', 'Zip'])['Amount'].transform('sum')
所以 groupby
将按 Fullname
和 zip
列分组,如您所说,然后我们调用 transform
在 Amount
列并通过传入字符串 sum
计算总金额,这将返回一个索引与原始 df
对齐的系列,然后您可以删除重复项.例如
So groupby
will group by the Fullname
and zip
columns, as you've stated, we then call transform
on the Amount
column and calculate the total amount by passing in the string sum
, this will return a series with the index aligned to the original df
, you can then drop the duplicates afterwards. e.g.
new_df = df.drop_duplicates(subset=['Fullname', 'Zip'])
这篇关于 pandas 重复属性的总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!