从一列 pandas 数据框中获取前三个值和后三个值 [英] Get the previous and next three values from an column of pandas dataframe

查看:69
本文介绍了从一列 pandas 数据框中获取前三个值和后三个值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 Python 和 Pandas 的新手.在这里,我有一个数据框,其中有两列.

I am new to python and pandas. Here, I have a dataframe where I have two columns.

Offset       predictedFeature
 0              2
 5              2
 11             0
 21             22
 28             22
 32              0
 38             21
 42             21
 52             21
 55              0
 58              0
 62              1
 66              1
 70              1
 73              0
 78              1
 79              1

因此,在此 df 中,我试图从 predictedFeature 列中的值是 0 的列中获取前 3 个值.所以对于例如第 3 行值是 0,所以我试图获取前 3 个值是 [2000, 2000],接下来的三个值是 [2200, 2200, 0].我正在为 predictedFeature 列中的每个 0 尝试这个.这样我就可以得到一个 df ,它将这两个作为新列:上一个和下一个值.

So, in this df I am trying to get the previous 3 values from the predictedFeature column of those where value is 0. So for e.g. 3rd row value is 0, so I am trying to get previous 3 values which are [2000, 2000] and next three are [2200, 2200, 0]. I am trying this for every 0 which is in the predictedFeature column. So that I can get a df which will have these two as new columns: Previous and next values.

Offset       feature       previous        Next            NewFeature 
 0              2             -             -                 2
 5              2             -             -                 2
 11             0           [2,2]          [22,22,0]          0
 21             22             -            -                 22
 28             22            -             -                 22
 32              0          [22,22,0]      [21,21,21]          0
 38             21            -              -                21 
 42             21            -              -                21
 52             21            -              -                21 
 55              0           [21,21,21]     [0,1,1]            0
 58              0           [0,21,21]      [1,1,1]            0   
 62              1             -              -                1
 66              1             -              -                1
 70              1             -              -                1
 73              0           [1,1,1]         [1,1]             1 
 78              1             -               -               1
 79              1             -               -               1

推荐答案

您可以通过 numpy.lib.stride_tricks.as_strided.这是我前段时间为此创建的一个函数.有点难理解.本质上,该函数只是沿新创建的轴修改记忆步长或步幅,使每一行都显示前一行的移位版本.

You can apply a windowed view on the array via numpy.lib.stride_tricks.as_strided. Here is a function I created some time ago for exactly that purpose. It's a bit tricky to understand. Essentially the function just modifies the memory-steps or strides along the newly created axis in a way that each row shows a shifted version of the previous.

def windowed_view(x, window_size):
    """Create a 2d windowed view of a 1d array.

    `x` must be a 1d numpy array.

    `numpy.lib.stride_tricks.as_strided` is used to create the view.
    The data is not copied. You should never write to a windowed view.

    Example:

    >>> x = np.array([1, 2, 3, 4, 5, 6])
    >>> windowed_view(x, 3)
    array([[1, 2, 3],
            [2, 3, 4],
            [3, 4, 5],
            [4, 5, 6]])
    """
    assert window_size <= x.size, "window_size (%s) must be <= x.size (%s)" % (window_size, x.size)
    return np.lib.stride_tricks.as_strided(
        x,
        shape=(x.size - window_size + 1, window_size),
        strides=(x.strides[0], x.strides[0])
    )

df = pd.DataFrame({'predictedFeature': [2000,2000,0,2200,2200,0,2100,2100,2100,0,0,100,100,100,0,100,100]})
w = windowed_view(df.predictedFeature, 7)

[[2000 2000    0 2200 2200    0 2100]
 [2000    0 2200 2200    0 2100 2100]
 [   0 2200 2200    0 2100 2100 2100]
 [2200 2200    0 2100 2100 2100    0]
 [2200    0 2100 2100 2100    0    0]
 [   0 2100 2100 2100    0    0  100]
 [2100 2100 2100    0    0  100  100]
 [2100 2100    0    0  100  100  100]
 [2100    0    0  100  100  100    0]
 [   0    0  100  100  100    0  100]
 [   0  100  100  100    0  100  100]]

但是您只需要 0 位于中间的行:

However you only want the rows where 0 is in the middle:

w[w[:,3]==0,:]

[[   0 2200 2200    0 2100 2100 2100]
 [2100 2100 2100    0    0  100  100]
 [2100 2100    0    0  100  100  100]]

唯一的问题是您分析的系列的开头和结尾,因为视图仅包含具有完整窗口的行.你在那里缺少一些零.但是,您可以遍历 w 的第一行和最后一行,并分别处理这些情况.希望到目前为止这会有所帮助.

The only problem is the beginning and end of the series you analyze, since the view only contains rows with the full window. You're missing some zeros there. However you could iterate over the first and last row of w and handle these cases seperately. Hope this helps so far.

这篇关于从一列 pandas 数据框中获取前三个值和后三个值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆