如何对数据帧执行窗口化操作? [英] How to perform windowed operation on dataframe?

查看:47
本文介绍了如何对数据帧执行窗口化操作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定

将pandas导入为pd将 numpy 导入为 npssss = pd.DataFrame(np.arange(6))

ssss:

<代码> 00 01 12 23 34 45 5

我想对数据框执行滑动窗口操作.

我想在任意大小、任意步幅的滑动窗口上执行一个通用函数(在这种情况下是指,但它可以是另一个函数并且涉及多个输入列).

本例中,窗口大小为2,步幅也为2.

pandas 支持这种操作吗?

资源:

 0 资源0 0 0.51 1 0.52 2 2.53 3 2.54 4 4.55 5 4.5


看来 groupby 不是我要找的.

我可以使用 numpy 解决方案,但即便如此,我也不确定标准方法是什么.我希望大熊猫支持这样的东西,但找不到任何方法可以做到这一点.


ssss:

假设第 1 列的值是字符串

<代码> 0 1 20 0 5"一种1 1 4"乙2 2 3"C3 3 2"d4 4 1"电子5 5 0"F

我想作为一个非常普遍的例子

def row_reduce(col0, col1):返回 str(2 * col0) + col1def col_reduce(rows_data):返回,".join(rows_data)

获得(忽略第2列)

 0 1 2 res0 0 5"一个05,24"1 1 4"b05,24"2 2 3"c 43,62"3 3 2"d 43,62"4 4 1"e81,100"5 5 0"f 81,100"

这首先使用自定义函数执行行缩减,然后执行窗口列缩减.

解决方案

如果窗口不重叠可以使用groupby.

我认为你需要 GroupBy.transform 带整数除法:

#if 默认范围索引ssss['res'] = ssss.groupby(ssss.index//2)[0].transform('mean')#any index - 辅助数组ssss['res'] = ssss.groupby(np.arange(len(ssss))//2)[0].transform('mean')打印 (ssss)0 资源0 0 0.51 1 0.52 2 2.53 3 2.54 4 4.55 5 4.5

print (df)0 1 20 0 5 一1 1 4 乙2 2 3 c3 3 2 天4 4 1 e5 5 0 英尺def row_reduce(col0, col1):返回 str(2 * col0) + str(col1)def col_reduce(rows_data):返回,".join(rows_data)df['res'] = (df.apply(lambda x: row_reduce(x[0], x[1]),axis=1).groupby(df.index//2).transform(col_reduce))打印 (df)0 1 2 分辨率0 0 5 05,241 1 4 b 05,242 2 3 c 43,623 3 2 d 43,624 4 1 e 81,1005 5 0 81,100

Given

import pandas as pd
import numpy as np

ssss = pd.DataFrame(np.arange(6))

ssss:

   0
0  0
1  1
2  2
3  3
4  4
5  5

I want to perform a sliding window operation on the dataframe.

I want to perform a general function (in this case mean, but it can be another function and involve more than one input column), on a sliding window of arbitrary size, with arbitrary strides.

In this case, the window size is 2, and the stride length is also 2.

Does pandas support this kind of operation?

res:

   0 res
0  0 0.5
1  1 0.5
2  2 2.5
3  3 2.5
4  4 4.5
5  5 4.5


It seems like groupby is not what I am looking for.

I could go to a numpy solution, but even then I am not sure what's the standard approach. I would expect pandas to support something like this, but couldn't find any method that does that.


Edit:

ssss:

Assume column 1 values are strings

   0 1   2
0  0 "5" a
1  1 "4" b
2  2 "3" c
3  3 "2" d
4  4 "1" e
5  5 "0" f

I would like to use as a very general example

def row_reduce(col0, col1):
    return str(2 * col0) + col1

def col_reduce(rows_data):
    return ",".join(rows_data)

to obtain (while ignoring column 2)

   0 1   2 res
0  0 "5" a "05,24"
1  1 "4" b "05,24"
2  2 "3" c "43,62"
3  3 "2" d "43,62"
4  4 "1" e "81,100"
5  5 "0" f "81,100"

This first performs the row reduction using the custom function, then performs a windowed column reduction.

解决方案

If windows are not overlapped you can use groupby.

I think you need GroupBy.transform with integer division:

#if default RangeIndex
ssss['res'] = ssss.groupby(ssss.index // 2)[0].transform('mean')
#any index - helper array
ssss['res'] = ssss.groupby(np.arange(len(ssss)) // 2)[0].transform('mean')
print (ssss)
   0  res
0  0  0.5
1  1  0.5
2  2  2.5
3  3  2.5
4  4  4.5
5  5  4.5

EDIT:

print (df)
   0  1  2
0  0  5  a
1  1  4  b
2  2  3  c
3  3  2  d
4  4  1  e
5  5  0  f

def row_reduce(col0, col1):
    return str(2 * col0) + str(col1)

def col_reduce(rows_data):
    return ",".join(rows_data)


df['res'] = (df.apply(lambda x: row_reduce(x[0], x[1]), axis=1)
               .groupby(df.index // 2)
               .transform(col_reduce))
print (df)
   0  1  2     res
0  0  5  a   05,24
1  1  4  b   05,24
2  2  3  c   43,62
3  3  2  d   43,62
4  4  1  e  81,100
5  5  0  f  81,100

这篇关于如何对数据帧执行窗口化操作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆