数据框操作-捕获值的变化 [英] Dataframe manipulation - capturing a change in value
问题描述
我目前有一个如下所示的数据框,它显示位置变化,加1单位,减1单位或不执行任何操作(0).
I currently have a dataframe as below, which shows a change in position, add 1 unit, subtract 1 unit or do nothing (0).
我正在寻找使用净位置(长(1)或平坦(0))创建第二个数据帧-假设不可能有净空(-1)位置.
I'm looking to create a second dataframe with the net position, which is either long (1) or flat (0) - assuming a net short (-1) position is not possible.
因此逻辑是从0开始,在发生第一个+1位置变化"(忽略任何后续+1)时切换为1,然后在看到-1时才切换回0.
So the logic is to start with 0, switch to 1 when the first +1 'change in position' occurs (any subsequent +1 is ignored), then only switch back to 0 when a -1 is seen.
关于如何执行此操作的任何想法?这个想法是按照下面的方法创建df2
Any thoughts on how to do this? The idea is to create df2 as per below
df.cumsum()如果每个+1个位置变化"都可以计数,则可以工作,但是我只想捕获多头或持仓",而不是任何累积的多头头寸的大小.
df.cumsum() would work if each +1 'change in position' were to count, but I only wish to capture 'long or flat' not the size of any accumulated long position.
输入数据帧:
输出数据帧:
推荐答案
以下是矢量化解决方案:
Here is a vectorized solution:
df['CiP'].where(df['CiP'].replace(to_replace=0, method='ffill').diff(), 0).cumsum()
说明:
- 对
replace
的调用将0
的值替换为前面的非零值. - 调用
diff
会指向实际的位置变化. - 对
where
的调用可确保将真正不变的值替换为0
. - 经过这种处理,
cumsum
就可以了.
- The call to
replace
replaces0
values by the preceding non-zero value. - The call to
diff
then points to actual changes in position. - The call to
where
ensures that values that do not really change are replaced by0
. - After this treatment,
cumsum
just works.
编辑:如果您有多列,请按上述定义一个函数并将其应用.
Edit: If you have multiple columns, then define a function as above and apply it.
def position(series):
return series.where(series.replace(to_replace=0, method='ffill').diff(), 0).cumsum()
df[list_of_columns].apply(position)
这可能比显式循环遍历列要快一些.
This could be slightly faster than explicitly looping over the columns.
这篇关于数据框操作-捕获值的变化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!