python-受 pandas 条件和/或布尔索引困扰 [英] python - stumped by pandas conditionals and/or boolean indexing

查看：95 发布时间：2020/9/22 4:05:54 python pandas indexing boolean conditional

本文介绍了python-受 pandas 条件和/或布尔索引困扰的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在条件/布尔索引方面遇到麻烦.我正在尝试使用逻辑来填充数据帧(dfp)，该逻辑取决于形状相似的数据帧(dfs)加上自身上一行(dfp)的数据. 这是我最近的失败...

I am having trouble with conditionals / boolean indexing. I am trying to populate a dataframe (dfp) with logic which is conditional on data from a similarly shaped dataframe (dfs) plus the previous row of itself (dfp). This is my latest fail...

import pandas as pd
dfs = pd.DataFrame({'a':[1,0,-1,0,1,0,0,-1,0,0],'b':[0,1,0,0,-1,0,1,0,-1,0]})

In [171]: dfs
Out[171]: 
       a  b
    0  1  0
    1  0  1
    2 -1  0
    3  0  0
    4  1 -1
    5  0  0
    6  0  1
    7 -1  0
    8  0 -1
    9  0  0

dfp = pd.DataFrame(index=dfs.index,columns=dfs.columns)

dfp[(dfs==1)|((dfp.shift(1)==1)&(dfs!=-1))] = 1

In [166]: dfp.fillna(0)
Out[166]: 
     a    b
0  1.0  0.0
1  0.0  1.0
2  0.0  0.0
3  0.0  0.0
4  1.0  0.0
5  0.0  0.0
6  0.0  1.0
7  0.0  0.0
8  0.0  0.0
9  0.0  0.0

因此，如果满足以下两个条件之一，我希望dfp在第n行中具有1:

So I would like dfp to have a 1 in row n if either of 2 conditions are met:

1) dfs same row = 1 or 2) both dfp previous row = 1 and dfs same row <> -1

我希望我的最终输出看起来像这样:

I would like my final output to look like this:

更新/ 有时视觉效果更有用-下面是如何在Excel中进行绘制的方法.

UPDATE / Sometimes the visual is more helpful - below is how it would map out in Excel.

在此先感谢您的宝贵时间.

Thanks in advance, very grateful for your time.

推荐答案

让我们总结一下不变量:

Let's summarize the invariants:

如果dfs值为1，则dfp值为1.
如果dfs值为-1，则dfp值为0.
如果dfs值是0，那么如果以前的dfp值是1，则dfp值是1，否则它是0.

If the dfs value is 1 then the dfp value is 1.
If the dfs value is -1 then the dfp value is 0.
If the dfs value is 0 then the dfp value is 1 if the previous dfp value is 1 otherwise it's 0.

或者用另一种方式表达:

Or to formulate in another way:

如果第一个值为1，则dfp以1开头，否则为0
dfp的值为0，直到dfs中没有1.
dfp的值为1，直到dfs中存在-1.

The dfp starts with 1 if the first value is 1, otherwise 0
The dfp values are 0 until there is a 1 in dfs.
The dfp values are 1 until there is a -1 in dfs.

这在python中很容易公式化:

This is very easy to formulate in python:

def create_new_column(dfs_col):
    newcol = np.zeros_like(dfs_col)
    if dfs_col[0] == 1:
        last = 1
    else:
        last = 0
    for idx, val in enumerate(dfs_col):
        if last == 1 and val == -1:
            last = 0
        if last == 0 and val == 1:
            last = 1
        newcol[idx] = last

    return newcol

测试:

>>> create_new_column(dfs.a)
array([1, 1, 0, 0, 1, 1, 1, 0, 0, 0], dtype=int64)
>>> create_new_column(dfs.b)
array([0, 1, 1, 1, 0, 0, 1, 1, 0, 0], dtype=int64)

但是在Python中这是非常低效的，因为在numpy数组(和pandas Series/DataFrames)上迭代很慢，并且在Python中的for -loops效率也很低.

However this is very inefficient in Python because iterating over numpy-arrays (and pandas Series/DataFrames) is slow and the for-loops in python are inefficient as well.

但是，如果您具有numba或Cython，则可以对其进行编译，并且它(可能)比任何NumPy解决方案都快(因为)，因为NumPy需要多次滚动和/或累加操作.

However if you have numba or Cython you can compile this and it will be (probably) faster than any NumPy solution could be, because NumPy would require several rolling and/or accumulate operations.

例如使用numba:

>>> import numba
>>> numba_version = numba.njit(create_new_column)  # compilation step

>>> numba_version(np.asarray(dfs.a))  # need cast to np.array
array([1, 1, 0, 0, 1, 1, 1, 0, 0, 0], dtype=int64)
>>> numba_version(np.asarray(dfs.b))  # need cast to np.array
array([0, 1, 1, 1, 0, 0, 1, 1, 0, 0], dtype=int64)

即使dfs具有数百万行，numba解决方案也将仅花费毫秒:

Even if dfs has millions of rows the numba solution will take only milliseconds:

>>> dfs = pd.DataFrame({'a':np.random.randint(-1, 2, 1000000),'b':np.random.randint(-1, 2, 1000000)})
>>> %timeit numba_version(np.asarray(dfs.b))
100 loops, best of 3: 9.37 ms per loop

这篇关于python-受 pandas 条件和/或布尔索引困扰的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

python-受 pandas 条件和/或布尔索引困扰 [英] python - stumped by pandas conditionals and/or boolean indexing

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

python-受​​ pandas 条件和/或布尔索引困扰 [英] python - stumped by pandas conditionals and/or boolean indexing

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

python-受 pandas 条件和/或布尔索引困扰 [英] python - stumped by pandas conditionals and/or boolean indexing

登录关闭