如何获得滚动的 pandas 数据框子集 [英] How to get rolling pandas dataframe subsets

查看:53
本文介绍了如何获得滚动的 pandas 数据框子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想以滚动"方式获取数据帧子集.我尝试了几件事但没有成功,这是我想做的一个例子.让我们考虑一下数据框.

I would like to get dataframe subsets in a "rolling" manner. I tried several things without success, here is an example of what I would like to do. Let's consider dataframe.

df
     var1      var2
0    43         74
1    44         74
2    45         66
3    46        268
4    47         66

我想使用以下函数创建一个新列,该列执行条件总和:

I would like to create a new column with the following function which performs a conditional sum:

def func(x):
    tmp = (x["var1"] * (x["var2"] == 74)).sum()
    return tmp

并这样称呼它

df["newvar"] = df.rolling(2, min_periods=1).apply(func)

这意味着该函数将基于数据框应用,而不是针对每一行或每一列

That would mean that the function would be applied on dataframe basis, and not for each row or column

它会回来

     var1      var2      newvar
0    43         74         43          # 43
1    44         74         87          # 43 * 1 + 44 * 1
2    45         66         44          # 44 * 1 + 45 * 0
3    46        268         0           # 45 * 0 + 46 * 0
4    47         66         0           # 46 * 0 + 47 * 0

有没有pythonic的方法来做到这一点?这只是一个示例,但条件(始终基于子数据框值取决于 2 列以上.

Is there a pythonic way to do this? This is just an example but the condition (always based on the sub-dataframe values depends on more than 2 columns.

推荐答案

更新评论

@unutbu 对一个非常相似的问题在这里发布了一个很好的答案 但似乎他的答案是基于 pd.rolling_apply 它将索引传递给函数.我不确定如何使用当前的 DataFrame.rolling.apply 方法来复制它.

updated comment

@unutbu posted a great answer to a very similar question here but it appears that his answer is based on pd.rolling_apply which passes the index to the function. I'm not sure how to replicate this with the current DataFrame.rolling.apply method.

似乎通过 apply 函数传递给参数的变量是每列的 numpy 数组(一次一个),而不是 DataFrame,因此您无法访问任何其他列不幸的是.

It appears that the variable passed to the argument through the apply function is a numpy array of each column (one at a time) and not a DataFrame so you do not have access to any other columns unfortunately.

但是你可以做的是使用一些布尔逻辑根据var2是否为74临时创建一个新列,然后使用滚动方法.

But what you can do is use some boolean logic to temporarily create a new column based on whether var2 is 74 or not and then use the rolling method.

df['new_var'] = df.var2.eq(74).mul(df.var1).rolling(2, min_periods=1).sum()

   var1  var2  new_var
0    43    74     43.0
1    44    74     87.0
2    45    66     44.0
3    46   268      0.0
4    47    66      0.0

临时列基于上面代码的前半部分.

The temporary column is based on the first half of the code above.

df.var2.eq(74).mul(df.var1)
# or equivalently with operators
# (df['var2'] == 74) * df['var1']

0    43
1    44
2     0
3     0
4     0

查找传递给apply的变量的类型

了解实际传递给 apply 函数的内容非常重要,而且我无法始终记住传递的内容,因此如果我不确定我会打印出变量及其类型,以便我清楚我在处理什么对象.使用原始数据帧查看此示例.

Finding the type of the variable passed to apply

Its very important to know what is actually being passed to the apply function and I can't always remember what is being passed so if I am unsure I will print out the variable along with its type so that it is clear to me what object I am dealing with. See this example with your original DataFrame.

def foo(x):
    print(x)
    print(type(x))
    return x.sum()

df.rolling(2, min_periods=1).apply(foo)

输出

[ 43.]
<class 'numpy.ndarray'>
[ 43.  44.]
<class 'numpy.ndarray'>
[ 44.  45.]
<class 'numpy.ndarray'>
[ 45.  46.]
<class 'numpy.ndarray'>
[ 46.  47.]
<class 'numpy.ndarray'>
[ 74.]
<class 'numpy.ndarray'>
[ 74.  74.]
<class 'numpy.ndarray'>
[ 74.  66.]
<class 'numpy.ndarray'>
[  66.  268.]
<class 'numpy.ndarray'>
[ 268.   66.]
<class 'numpy.ndarray'>

这篇关于如何获得滚动的 pandas 数据框子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆