为什么 pandas '=='与'.eq()'不同 [英] Why is pandas '==' different than '.eq()'

查看:33
本文介绍了为什么 pandas '=='与'.eq()'不同的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑系列s

s = pd.Series([(1, 2), (3, 4), (5, 6)])

这是预期的

s == (3, 4)

0    False
1     True
2    False
dtype: bool

不是

s.eq((3, 4))

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

ValueError: Lengths must be equal

我当时以为它们是相同的.它们之间有什么区别?

I was under the assumption they were the same. What is the difference between them?

文档说?

等效于series == other,但支持用fill_value代替输入之一中的丢失数据.

Equivalent to series == other, but with support to substitute a fill_value for missing data in one of the inputs.

这似乎意味着它们应该工作相同,因此造成混乱.

This seems to imply that they should work the same, hence the confusion.

推荐答案

实际上,您遇到的是一种特殊情况,可以轻松地将pandas.Seriesnumpy.ndarray与常规python构造进行比较.源代码为:

What you encounter is actually a special case that makes it easier to compare pandas.Series or numpy.ndarray with normal python constructs. The source code reads:

def flex_wrapper(self, other, level=None, fill_value=None, axis=0):
    # validate axis
    if axis is not None:
        self._get_axis_number(axis)
    if isinstance(other, ABCSeries):
        return self._binop(other, op, level=level, fill_value=fill_value)
    elif isinstance(other, (np.ndarray, list, tuple)):
        if len(other) != len(self):
            # ---------------------------------------
            # you never reach the `==` path because you get into this.
            # ---------------------------------------
            raise ValueError('Lengths must be equal')  
        return self._binop(self._constructor(other, self.index), op,
                           level=level, fill_value=fill_value)
    else:
        if fill_value is not None:
            self = self.fillna(fill_value)

        return self._constructor(op(self, other),
                                 self.index).__finalize__(self)

您正在点击ValueError是因为pandas假设.eq您希望将值转换为numpy.ndarraypandas.Series(,如果您为其提供数组,列表或元组),而不是将其与tuple进行实际比较.例如,如果您有:

You're hitting the ValueError because pandas assumes for .eq that you wanted the value converted to a numpy.ndarray or pandas.Series (if you give it an array, list or tuple) instead of actually comparing it to the tuple. For example if you have:

s = pd.Series([1,2,3])
s.eq([1,2,3])

您不希望它将每个元素与[1,2,3]进行比较.

you wouldn't want it to compare each element to [1,2,3].

问题在于,object数组(与dtype=uint一样)经常滑过裂缝或被故意忽略.该方法中的一个简单的if self.dtype != 'object'分支可以解决此问题.但是,也许开发人员有充分的理由使这种情况有所不同.我建议您在其错误跟踪器上进行澄清.

The problem is that object arrays (as with dtype=uint) often slip through the cracks or are neglected on purpose. A simple if self.dtype != 'object' branch inside that method could resolve this issue. But maybe the developers had strong reasons to actually make this case different. I would advise to ask for clarification by posting on their bug tracker.

您还没有问过如何使其正常工作,但是为了完整起见,我将介绍一种可能性(根据源代码,似乎您需要自己将其包装为pandas.Series):

You haven't asked how you can make it work correctly but for completness I'll include one possibility (according to the source code it seems likely you need to wrap it as pandas.Series yourself):

>>> s.eq(pd.Series([(1, 2)]))
0     True
1    False
2    False
dtype: bool

这篇关于为什么 pandas '=='与'.eq()'不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆