是否在完整数组或过滤后的数组上计算了 pandas 数据帧上numpy.where方法的结果？ [英] Are the outcomes of the numpy.where method on a pandas dataframe calculated on the full array or the filtered array?

查看：64 发布时间：2020/9/25 1:46:08 python arrays pandas numpy

本文介绍了是否在完整数组或过滤后的数组上计算了 pandas 数据帧上numpy.where方法的结果？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想在熊猫数据框上使用 numpyp.where 来检查列中是否存在某个字符串。如果存在字符串，则应用拆分功能并采用第二个列表元素，否则不采用第一个字符。但是下面的代码不起作用，它会引发 IndexError：列表索引超出范围，因为第一项不包含下划线：

I want to use a numpyp.where on a pandas dataframe to check for existence of a certain string in a column. If the string is present apply a split-function and take the second list element, if not just take the first character. However the following code doesn't work, it throws a IndexError: list index out of range because the first entry contains no underscore:

import pandas as pd
import numpy as np
df = pd.DataFrame({'A':['a','a_1','b_','b_2_3']})
df["B"] = np.where(df.A.str.contains('_'),df.A.apply(lambda x: x.split('_')[1]),df.A.str[0])

仅通话 np.where 返回条件成立的一组索引，所以我觉得分开 -command将仅用于该数据子集：

Only calling np.where returns an array of indices for which the condition holds true, so I was under the impression that the split-command would only be used on that subset of the data:

np.where(df.A.str.contains('_'))
Out[14]: (array([1, 2, 3], dtype=int64),)

但是显然 split -命令用于整个未过滤的数组，这对我来说似乎很奇怪，因为这似乎是潜在的大量不必要的数组操作会减慢计算速度。

But apparently the split-command is used on the entire unfiltered array which seems odd to me as that seems like a potentially big number of unnecessary operations that would slow down the calculation.

我不知道

我只是想知道这是预期的结果还是大熊猫或numpy的问题？。

I'm merely wondering if this is an expected outcome or an issue with either pandas or numpy.

推荐答案

Python不是惰性语言，因此可以立即评估代码。生成器/迭代器的确引入了一些惰性，但这在这里并不适用

Python isn't a "lazy" language so code is evaluated immediately. generators/iterators do introduce some lazyness, but that doesn't apply here

如果我们拆分您的代码行，则会得到以下语句：

if we split your line of code, we get the following statements:

df.A.str.contains（'_'）

df.A.apply（lambda x：x.split（'_'）[1]）

df.A.str [0]

df.A.str.contains('_')
df.A.apply(lambda x: x.split('_')[1])
df.A.str[0]

Python必须先评估这些语句，然后才能执行将它们作为参数传递给 np。其中

Python has to evaluate these statements before it can pass them as arguments to np.where

要查看所有发生的情况，我们可以将上面的内容重写为小函数显示一些输出：

to see all this happening, we can rewrite the above as little functions that displays some output:

def fn_contains(x):
    print('contains', x)
    return '_' in x

def fn_split(x):
    s = x.split('_')
    print('split', x, s)
    # check for errors here
    if len(s) > 1:
        return s[1]

def fn_first(x):
    print('first', x)
    return x[0]

然后您可以使用以下命令在数据上运行它们：

and then you can run them on your data with:

s = pd.Series(['a','a_1','b_','b_2_3'])
np.where(
  s.apply(fn_contains),
  s.apply(fn_split),
  s.apply(fn_first)
)

，您将看到依次执行的所有操作。这基本上就是您执行事物时内部 numpy / pandas中发生的事情

and you'll see everything being executed in order. this is basically what's happening "inside" numpy/pandas when you execute things

这篇关于是否在完整数组或过滤后的数组上计算了 pandas 数据帧上numpy.where方法的结果？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

是否在完整数组或过滤后的数组上计算了 pandas 数据帧上numpy.where方法的结果？ [英] Are the outcomes of the numpy.where method on a pandas dataframe calculated on the full array or the filtered array?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

是否在完整数组或过滤后的数组上计算了 pandas 数据帧上numpy.where方法的结果？ [英] Are the outcomes of the numpy.where method on a pandas dataframe calculated on the full array or the filtered array?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭