为什么 pandas '=='与'.eq()'不同 [英] Why is pandas '==' different than '.eq()'
问题描述
考虑系列s
s = pd.Series([(1, 2), (3, 4), (5, 6)])
这是预期的
s == (3, 4)
0 False
1 True
2 False
dtype: bool
不是
s.eq((3, 4))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
ValueError: Lengths must be equal
我当时以为它们是相同的.它们之间有什么区别?
I was under the assumption they were the same. What is the difference between them?
文档说?
等效于series == other,但支持用fill_value代替输入之一中的丢失数据.
Equivalent to series == other, but with support to substitute a fill_value for missing data in one of the inputs.
这似乎意味着它们应该工作相同,因此造成混乱.
This seems to imply that they should work the same, hence the confusion.
推荐答案
实际上,您遇到的是一种特殊情况,可以轻松地将pandas.Series
或numpy.ndarray
与常规python构造进行比较.源代码为:
What you encounter is actually a special case that makes it easier to compare pandas.Series
or numpy.ndarray
with normal python constructs. The source code reads:
def flex_wrapper(self, other, level=None, fill_value=None, axis=0):
# validate axis
if axis is not None:
self._get_axis_number(axis)
if isinstance(other, ABCSeries):
return self._binop(other, op, level=level, fill_value=fill_value)
elif isinstance(other, (np.ndarray, list, tuple)):
if len(other) != len(self):
# ---------------------------------------
# you never reach the `==` path because you get into this.
# ---------------------------------------
raise ValueError('Lengths must be equal')
return self._binop(self._constructor(other, self.index), op,
level=level, fill_value=fill_value)
else:
if fill_value is not None:
self = self.fillna(fill_value)
return self._constructor(op(self, other),
self.index).__finalize__(self)
您正在点击ValueError
是因为pandas假设.eq
您希望将值转换为numpy.ndarray
或pandas.Series
(,如果您为其提供数组,列表或元组),而不是将其与tuple
进行实际比较.例如,如果您有:
You're hitting the ValueError
because pandas assumes for .eq
that you wanted the value converted to a numpy.ndarray
or pandas.Series
(if you give it an array, list or tuple) instead of actually comparing it to the tuple
. For example if you have:
s = pd.Series([1,2,3])
s.eq([1,2,3])
您不希望它将每个元素与[1,2,3]
进行比较.
you wouldn't want it to compare each element to [1,2,3]
.
问题在于,object
数组(与dtype=uint
一样)经常滑过裂缝或被故意忽略.该方法中的一个简单的if self.dtype != 'object'
分支可以解决此问题.但是,也许开发人员有充分的理由使这种情况有所不同.我建议您在其错误跟踪器上进行澄清.
The problem is that object
arrays (as with dtype=uint
) often slip through the cracks or are neglected on purpose. A simple if self.dtype != 'object'
branch inside that method could resolve this issue. But maybe the developers had strong reasons to actually make this case different. I would advise to ask for clarification by posting on their bug tracker.
您还没有问过如何使其正常工作,但是为了完整起见,我将介绍一种可能性(根据源代码,似乎您需要自己将其包装为pandas.Series
):
You haven't asked how you can make it work correctly but for completness I'll include one possibility (according to the source code it seems likely you need to wrap it as pandas.Series
yourself):
>>> s.eq(pd.Series([(1, 2)]))
0 True
1 False
2 False
dtype: bool
这篇关于为什么 pandas '=='与'.eq()'不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!