将具有相似行值的值相加 [英] Summing values with similar row values
本文介绍了将具有相似行值的值相加的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个像这样的熊猫数据集
I have a pandas data set that looks like this
city difference
NY 6
SF 8
LA 8
NY 9
SF 10
我想基于city
列对difference
列的值求和,以便最终数据集看起来像
I want to sum up the values of the difference
column based on the city
column so that my final data set looks like
city difference total difference
NY 6 15
NY 9
LA 8 8
SF 10 10
我尝试过
df['total difference'] = df.groupby('city')['difference'].sum()
但是没有用.我什至尝试如何对熊猫中特定行的值求和?,但获得了新列的NaN
值.请帮忙!
but it didn't work. I even tried How to sum values of particular rows in pandas? but got NaN
values for the new column. Please help!
推荐答案
I think you need transform
:
df['total difference'] = df.groupby('city')['difference'].transform(sum)
print (df)
city difference total difference
0 NY 6 15
1 SF 8 18
2 LA 8 8
3 NY 9 15
4 SF 10 18
如果还需要排序列:
df['total difference'] = df.groupby('city')['difference'].transform('sum')
df = df.sort_values('city')
print (df)
city difference total difference
2 LA 8 8
0 NY 6 15
3 NY 9 15
1 SF 8 18
4 SF 10 18
我对功能上的差异和计时非常相似很感兴趣:
I was interested about differences in functions and timings are very similar:
#[10000000 rows x 2 columns]
np.random.seed(100)
df = pd.DataFrame(np.random.randint(1000, size=(10000000,2)), columns=['city','difference'])
#print (df)
In [293]: %timeit (df.groupby('city')['difference'].transform('sum'))
1 loop, best of 3: 570 ms per loop
In [294]: %timeit (df.groupby('city')['difference'].transform(sum))
1 loop, best of 3: 567 ms per loop
In [295]: %timeit (df.groupby('city')['difference'].transform(np.sum))
1 loop, best of 3: 561 ms per loop
这篇关于将具有相似行值的值相加的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文