Python数据帧:到达条件的列的累积和,并返回索引 [英] Python Data Frame: cumulative sum of column until condition is reached and return the index
问题描述
说我有一个简单的数据框架,有两列:
index Num_Albums Num_authors
0 10 4
1 1 5
2 4 4
3 7 1000
4 1 44
5 3 8
Num_Abums_tot = sum(Num_Albums)= 30
我需要在 Num_Albums
中累积一些数据,直到达到一定条件。注册条件达到的索引,并从 Num_authors
中获取相应的值。
示例:
累计总和 Num_Albums
,直到总和等于30( - > 15±2)的50%±1/15:
10 = 15±2?不,然后继续;
10 + 1 = 15±2?不,然后继续
10 + 1 + 41 = 15±2?是的,停止。
条件达到索引2.然后获取 Num_Authors
在该索引: Num_Authors(2)= 4
我想看看是否有一个功能已经实现在大熊猫
之前,我开始思考如何用一段时间/ for循环....
[我想指定我想要在相关索引中检索值的列(当我有4列时,这个方便起见,我想在列1中求和元素,条件达到=是,然后得到相应的值列2;然后对列3和4执行相同操作)]。
可以使用 cumsum
。然后使用 np。 isclose
,它具有内置的公差参数,以检查此系列中存在的值是否在15 +/- 2的指定阈值内。这将返回一个布尔数组。
通过 np.flatnonzero
,返回 True
条件成立的索引的序数值。我们选择 True
值的第一个实例。
最后,使用 .iloc
根据前面计算的索引检索您所需的列名的值。
val = np。 flatnonzero(np.isclose(df.Num_Albums.cumsum()。值,15,atol = 2))[0]
df ['Num_authors']。iloc [val]#用于更快的访问,使用.iat
4
执行 np.isclose
系列
后转换为数组:
np.isclose (df.Num_Albums.cumsum()。值,15,atol = 2)
数组([False,False,True,False,False,False],dtype = bool)
选择 - 2:
使用
也支持 cumsum
计算系列中的pd.Index.get_loc 公差
param在最近的
方法中。
val = pd.Index(df .num_Albums.cumsum())。get_loc(15,'nearest',tolerance = 2)
df.get_value(val,'Num_authors')
4
选择 - 3:
< >使用
idxmax
找到 sub
之后创建的布尔蒙版的 True的第一个索引值和 abs
在 cumsum
系列中的操作: df.get_value(df.Num_Albums.cumsum()。sub(15).abs().le(2).idxmax(),'Num_authors')
4
I am new in Python and am currently facing an issue I can't solve. I really hope you can help me out. English is not my native languge so I am sorry if I am not able to express myself properly.
Say I have a simple data frame with two columns:
index Num_Albums Num_authors
0 10 4
1 1 5
2 4 4
3 7 1000
4 1 44
5 3 8
Num_Abums_tot = sum(Num_Albums) = 30
I need to do a cumulative sum of the data in Num_Albums
until a certain condition is reached. Register the index at which the condition is achieved and get the correspondent value from Num_authors
.
Example:
cumulative sum of Num_Albums
until the sum equals 50% ± 1/15 of 30 (--> 15±2):
10 = 15±2? No, then continue;
10+1 =15±2? No, then continue
10+1+41 = 15±2? Yes, stop.
Condition reached at index 2. Then get Num_Authors
at that index: Num_Authors(2)=4
I would like to see if there's a function already implemented in pandas
, before I start thinking how to do it with a while/for loop....
[I would like to specify the column from which I want to retrieve the value at the relevant index (this comes in handy when I have e.g. 4 columns and i want to sum elements in column 1, condition achieved =yes then get the correspondent value in column 2; then do the same with column 3 and 4)].
解决方案 Opt - 1:
You could compute the cumulative sum using cumsum
. Then use np.isclose
with it's inbuilt tolerance parameter to check if the values present in this series lies within the specified threshold of 15 +/- 2. This returns a boolean array.
Through np.flatnonzero
, return the ordinal values of the indices for which the True
condition holds. We select the first instance of a True
value.
Finally, use .iloc
to retrieve value of the column name you require based on the index computed earlier.
val = np.flatnonzero(np.isclose(df.Num_Albums.cumsum().values, 15, atol=2))[0]
df['Num_authors'].iloc[val] # for faster access, use .iat
4
When performing np.isclose
on the series
later converted to an array:
np.isclose(df.Num_Albums.cumsum().values, 15, atol=2)
array([False, False, True, False, False, False], dtype=bool)
Opt - 2:
Use pd.Index.get_loc
on the cumsum
calculated series which also supports a tolerance
parameter on the nearest
method.
val = pd.Index(df.Num_Albums.cumsum()).get_loc(15, 'nearest', tolerance=2)
df.get_value(val, 'Num_authors')
4
Opt - 3:
Use idxmax
to find the first index of a True
value for the boolean mask created after sub
and abs
operations on the cumsum
series:
df.get_value(df.Num_Albums.cumsum().sub(15).abs().le(2).idxmax(), 'Num_authors')
4
这篇关于Python数据帧:到达条件的列的累积和,并返回索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!