在Pandas Dataframe的行上求和 [英] Summing across rows of Pandas Dataframe
问题描述
stocks = pd.Series(['A' 'A','B','C','C'],name ='stock')
位置= pd.Series([100,200,300,400,500],name ='positions' )
same1 = pd.Series(['AA','AA','BB','CC','CC'],name ='same1')
same2 = pd.Series 'AAA','AAA','BBB','CCC','CCC'],name ='same2')
diff = pd.Series(['A1','A2','B3' 'C1','C2'],name ='different')
df = pd.DataFrame([stocks,same1,positions,same2,diff])T
df
这样一个大熊猫DataFrame看起来像
code> stock same1位置same2不同
0 A AA 100 AAA A1
1 A AA 200 AAA A2
2 B BB 300 BBB B3
3 C CC 400 CCC C1
4 C CC 500 CCC C2
我对不同列中的数据不感兴趣,希望总结独特的其他列。我目前正在做:
df.groupby(['stock','same1','same2'])[ '位置'] sum()
其中:
stock same1 same2
A AA AAA 300
B BB BBB 300
C CC CCC 900
名称:职位
问题是这是一个pd.Series(带多索引)。目前,我再次迭代创建一个DataFrame。我确信我错过了一个方法。基本上,我想从DataFrame中删除1列,然后重建列,以便将一列相加,其余字段(相同)保持原位。
如果有空位,这个groupby方法会中断。所以我目前在DataFrame上使用一个复杂的迭代来构建一个新的。有没有更好的方法?
步骤1.使用[[位置]]而不是['位置']:
在[30]中:df2 = df.groupby(['stock','same1','same2'])[ ['position']] sum()
在[31]中:df2
出[31]:
位置
股票same1 same2
A AA AAA 300
B BB BBB 300
C CC CCC 900
步骤2.然后使用 reset_index
将索引移回列
在[34]中:df2.reset_index()
Out [34]:
stock same1 same2位置
0 A AA AAA 300
1 B BB BBB 300
2 C CC CCC 900
编辑
看起来我的方法不是很好。
感谢@Andy和@unutbu,您可以通过更优雅的方式实现您的目标:
方法1:
df.groupby(['stock' ,'same1','same2'])['positions']。sum()。reset_index()
方法2:
df.groupby(['stock','same1','same2'],as_index = False )['position']。sum()
I have a DataFrame of records that looks something like this:
stocks = pd.Series(['A', 'A', 'B', 'C', 'C'], name = 'stock')
positions = pd.Series([ 100, 200, 300, 400, 500], name = 'positions')
same1 = pd.Series(['AA', 'AA', 'BB', 'CC', 'CC'], name = 'same1')
same2 = pd.Series(['AAA', 'AAA', 'BBB', 'CCC', 'CCC'], name = 'same2')
diff = pd.Series(['A1', 'A2', 'B3' ,'C1', 'C2'], name = 'different')
df = pd.DataFrame([stocks, same1, positions, same2, diff]).T
df
This gives a pandas DataFrame that looks like
stock same1 positions same2 different
0 A AA 100 AAA A1
1 A AA 200 AAA A2
2 B BB 300 BBB B3
3 C CC 400 CCC C1
4 C CC 500 CCC C2
I'm not interested in the data in 'different' columns and want to sum the positions along the unique other columns. I am currently doing it by:
df.groupby(['stock','same1','same2'])['positions'].sum()
which gives:
stock same1 same2
A AA AAA 300
B BB BBB 300
C CC CCC 900
Name: positions
Problem is that this is a pd.Series (with Multi-Index). Currently I iterate over it to build a DataFrame again. I am sure that I am missing a method. Basically I want to drop 1 column from a DataFrame and then "rebuild it" so that one column is summed and the rest of the fields (which are the same) stay in place.
This groupby method breaks if there are empty positions. So I currently use an elaborate iteration over the DataFrame to build a new one. Is there a better approach?
Step 1. Use [['positions']] instead of ['positions']:
In [30]: df2 = df.groupby(['stock','same1','same2'])[['positions']].sum()
In [31]: df2
Out[31]:
positions
stock same1 same2
A AA AAA 300
B BB BBB 300
C CC CCC 900
Step 2. And then use reset_index
to move the index back to the column
In [34]: df2.reset_index()
Out[34]:
stock same1 same2 positions
0 A AA AAA 300
1 B BB BBB 300
2 C CC CCC 900
EDIT
Seems my method is not so good.
Thanks to @Andy and @unutbu , you can achieve your goal by more elegant ways:
method 1:
df.groupby(['stock', 'same1', 'same2'])['positions'].sum().reset_index()
method 2:
df.groupby(['stock', 'same1', 'same2'], as_index=False)['positions'].sum()
这篇关于在Pandas Dataframe的行上求和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!