pandas int和np.int64中的奇怪isinstance行为 [英] Strange isinstance behaviour in pandas int and np.int64

查看:232
本文介绍了pandas int和np.int64中的奇怪isinstance行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一系列的np.int64,但是由于某些原因,在不同情况下使用isinstance()会产生不同的答案.

I have a series of np.int64, but for some reason using isinstance() in different cases yields different answers.

您可以在所附的图像中看到,如果我检查单个元素的类型,则会得到numpy.int64,因此该特定元素上的isinstance可以正确地工作.

You can see in the attached image that if I check the type of the individual element, I get numpy.int64, and so the isinstance on this particular element works out correctly.

但是,当我使用apply时,会发生相反的行为,并且得到不同的结果.这是因为Apply会以某种方式更改类型吗?

When I use apply, however, the opposite behavior happens, and I get different results. Is this because apply changes the type somehow?

更详细地,原始系列定义为:

In more detail, the original series is defined with:

sample_series = pd.Series([np.int64(1), np.int64(25), np.int64(50) ,np.int64(75)])

当我检查一个元素type(sample_series.loc[0])的类型时,我得到的输出是numpy.int64.

When I check the type of one element,type(sample_series.loc[0]), I get the output numpy.int64.

现在使用isinstance给我以下(预期的)答案,isinstance(sample_series.loc[0], int),输出为:Falseisinstance(sample_series.loc[0], np.int64),输出为:True.

Now using isinstance gives me the following (expected) answers, isinstance(sample_series.loc[0], int) , with output: False and isinstance(sample_series.loc[0], np.int64) with output: True.

另一方面,sample_series.apply(lambda x : isinstance(x,int))给出输出:

0    True
1    True
2    True
3    True
dtype: bool

sample_series.apply(lambda x : isinstance(x, np.int64))给出输出时:

0    False
1    False
2    False
3    False
dtype: bool

因此,结果似乎不一致.

So it seems that the results are inconsistent.

谢谢!

推荐答案

DataFrame.applySeries.apply似乎有些不同.例如:

It appears that DataFrame.apply and Series.apply are slightly different under the hood. For instance:

sample_series = pd.Series([np.int64(1), np.int64(50), np.int64(75)])
#0     1
#1    50
#2    75
#dtype: int64

sample_series.apply(lambda x: type(x))
#0    <class 'int'>
#1    <class 'int'>
#2    <class 'int'>
#dtype: object

但是

df = pd.DataFrame({'val': sample_series})
df.dtypes
#val    int64
#dtype: object

df.apply(lambda row: type(row.val), axis=1)
#0    <class 'numpy.int64'>
#1    <class 'numpy.int64'>
#2    <class 'numpy.int64'>
#dtype: object

如果您查看Series.apply代码,则可能是由于

If you look into the Series.apply code, it looks like the weird behavior comes about here

# row-wise access
if is_extension_type(self.dtype):
    mapped = self._values.map(f)
else:
    values = self.asobject
    mapped = lib.map_infer(values, f, convert=convert_dtype)

正在处理您的系列,然后创建values(即array([1, 50, 75], dtype=object))并将其传递给pandas._libs中的另一个函数以应用您的函数f = lambda x: isinstance(x, np.int64)

It's taking your series, and then creating values which is array([1, 50, 75], dtype=object) and passing that to another function in pandas._libs to apply your function f = lambda x: isinstance(x, np.int64)

另一方面,带有axis=1DataFrame.apply可以按预期工作,因为当它定义values时,它是通过values = self.values

On the other hand DataFrame.apply with axis=1 works as expected, because when it defines values it does so by values = self.values See here, which gives you values = array([ 1, 50, 75], dtype=int64)

实际上,如果要将基础的熊猫Series.apply代码更改为values=self.values,则将获得期望的输出.

In fact, if you were to change the underlying pandas Series.apply code to values=self.values you would get the output you would expect.

这篇关于pandas int和np.int64中的奇怪isinstance行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆