在Pandas Dataframe的行上求和 [英] Summing across rows of Pandas Dataframe

查看:831
本文介绍了在Pandas Dataframe的行上求和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个DataFrame的记录,看起来像这样:

  stocks = pd.Series(['A' 'A','B','C','C'],name ='stock')
位置= pd.Series([100,200,300,400,500],name ='positions' )
same1 = pd.Series(['AA','AA','BB','CC','CC'],name ='same1')
same2 = pd.Series 'AAA','AAA','BBB','CCC','CCC'],name ='same2')
diff = pd.Series(['A1','A2','B3' 'C1','C2'],name ='different')
df = pd.DataFrame([stocks,same1,positions,same2,diff])T
df

这样一个大熊猫DataFrame看起来像

 code> stock same1位置same2不同
0 A AA 100 AAA A1
1 A AA 200 AAA A2
2 B BB 300 BBB B3
3 C CC 400 CCC C1
4 C CC 500 CCC C2

我对不同列中的数据不感兴趣,希望总结独特的其他列。我目前正在做:

  df.groupby(['stock','same1','same2'])[ '位置'] sum()

其中:

  stock same1 same2 
A AA AAA 300
B BB BBB 300
C CC CCC 900
名称:职位

问题是这是一个pd.Series(带多索引)。目前,我再次迭代创建一个DataFrame。我确信我错过了一个方法。基本上,我想从DataFrame中删除1列,然后重建列,以便将一列相加,其余字段(相同)保持原位。



如果有空位,这个groupby方法会中断。所以我目前在DataFrame上使用一个复杂的迭代来构建一个新的。有没有更好的方法?

解决方案

步骤1.使用[[位置]]而不是['位置']:

 在[30]中:df2 = df.groupby(['stock','same1','same2'])[ ['position']] sum()

在[31]中:df2
出[31]:

位置
股票same1 same2
A AA AAA 300
B BB BBB 300
C CC CCC 900

步骤2.然后使用 reset_index 将索引移回列

 在[34]中:df2.reset_index()
Out [34]:
stock same1 same2位置
0 A AA AAA 300
1 B BB BBB 300
2 C CC CCC 900



编辑



看起来我的方法不是很好。



感谢@Andy和@unutbu,您可以通过更优雅的方式实现您的目标:



方法1:

  df.groupby(['stock' ,'same1','same2'])['positions']。sum()。reset_index()

方法2:

  df.groupby(['stock','same1','same2'],as_index = False )['position']。sum()


I have a DataFrame of records that looks something like this:

stocks = pd.Series(['A', 'A', 'B', 'C', 'C'], name = 'stock')
positions = pd.Series([ 100, 200, 300, 400, 500], name = 'positions')
same1 = pd.Series(['AA', 'AA', 'BB', 'CC', 'CC'], name = 'same1')
same2 = pd.Series(['AAA', 'AAA', 'BBB', 'CCC', 'CCC'], name = 'same2')
diff = pd.Series(['A1', 'A2', 'B3' ,'C1', 'C2'], name = 'different')
df = pd.DataFrame([stocks, same1, positions, same2, diff]).T
df

This gives a pandas DataFrame that looks like

      stock same1 positions same2 different
0     A    AA       100   AAA        A1
1     A    AA       200   AAA        A2
2     B    BB       300   BBB        B3
3     C    CC       400   CCC        C1
4     C    CC       500   CCC        C2

I'm not interested in the data in 'different' columns and want to sum the positions along the unique other columns. I am currently doing it by:

df.groupby(['stock','same1','same2'])['positions'].sum()

which gives:

stock  same1  same2
A      AA     AAA      300
B      BB     BBB      300
C      CC     CCC      900
Name: positions

Problem is that this is a pd.Series (with Multi-Index). Currently I iterate over it to build a DataFrame again. I am sure that I am missing a method. Basically I want to drop 1 column from a DataFrame and then "rebuild it" so that one column is summed and the rest of the fields (which are the same) stay in place.

This groupby method breaks if there are empty positions. So I currently use an elaborate iteration over the DataFrame to build a new one. Is there a better approach?

解决方案

Step 1. Use [['positions']] instead of ['positions']:

In [30]: df2 = df.groupby(['stock','same1','same2'])[['positions']].sum()

In [31]: df2 
Out[31]: 

                   positions
stock same1 same2               
A     AA    AAA          300 
B     BB    BBB          300 
C     CC    CCC          900 

Step 2. And then use reset_index to move the index back to the column

In [34]: df2.reset_index()
Out[34]: 
  stock same1 same2  positions
0     A    AA   AAA        300 
1     B    BB   BBB        300 
2     C    CC   CCC        900

EDIT

Seems my method is not so good.

Thanks to @Andy and @unutbu , you can achieve your goal by more elegant ways:

method 1:

df.groupby(['stock', 'same1', 'same2'])['positions'].sum().reset_index()

method 2:

df.groupby(['stock', 'same1', 'same2'], as_index=False)['positions'].sum()

这篇关于在Pandas Dataframe的行上求和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆