pandas int和np.int64中的奇怪isinstance行为 [英] Strange isinstance behaviour in pandas int and np.int64
问题描述
我有一系列的np.int64
,但是由于某些原因,在不同情况下使用isinstance()
会产生不同的答案.
I have a series of np.int64
, but for some reason using isinstance()
in different cases yields different answers.
您可以在所附的图像中看到,如果我检查单个元素的类型,则会得到numpy.int64,因此该特定元素上的isinstance可以正确地工作.
You can see in the attached image that if I check the type of the individual element, I get numpy.int64, and so the isinstance on this particular element works out correctly.
但是,当我使用apply时,会发生相反的行为,并且得到不同的结果.这是因为Apply会以某种方式更改类型吗?
When I use apply, however, the opposite behavior happens, and I get different results. Is this because apply changes the type somehow?
更详细地,原始系列定义为:
In more detail, the original series is defined with:
sample_series = pd.Series([np.int64(1), np.int64(25), np.int64(50) ,np.int64(75)])
当我检查一个元素type(sample_series.loc[0])
的类型时,我得到的输出是numpy.int64
.
When I check the type of one element,type(sample_series.loc[0])
, I get the output numpy.int64
.
现在使用isinstance给我以下(预期的)答案,isinstance(sample_series.loc[0], int)
,输出为:False
和isinstance(sample_series.loc[0], np.int64)
,输出为:True
.
Now using isinstance gives me the following (expected) answers, isinstance(sample_series.loc[0], int)
, with output: False
and isinstance(sample_series.loc[0], np.int64)
with output: True
.
另一方面,sample_series.apply(lambda x : isinstance(x,int))
给出输出:
0 True
1 True
2 True
3 True
dtype: bool
sample_series.apply(lambda x : isinstance(x, np.int64))
给出输出时:
0 False
1 False
2 False
3 False
dtype: bool
因此,结果似乎不一致.
So it seems that the results are inconsistent.
谢谢!
推荐答案
DataFrame.apply
和Series.apply
似乎有些不同.例如:
It appears that DataFrame.apply
and Series.apply
are slightly different under the hood. For instance:
sample_series = pd.Series([np.int64(1), np.int64(50), np.int64(75)])
#0 1
#1 50
#2 75
#dtype: int64
sample_series.apply(lambda x: type(x))
#0 <class 'int'>
#1 <class 'int'>
#2 <class 'int'>
#dtype: object
但是
df = pd.DataFrame({'val': sample_series})
df.dtypes
#val int64
#dtype: object
df.apply(lambda row: type(row.val), axis=1)
#0 <class 'numpy.int64'>
#1 <class 'numpy.int64'>
#2 <class 'numpy.int64'>
#dtype: object
If you look into the Series.apply code, it looks like the weird behavior comes about here
# row-wise access
if is_extension_type(self.dtype):
mapped = self._values.map(f)
else:
values = self.asobject
mapped = lib.map_infer(values, f, convert=convert_dtype)
正在处理您的系列,然后创建values
(即array([1, 50, 75], dtype=object)
)并将其传递给pandas._libs
中的另一个函数以应用您的函数f = lambda x: isinstance(x, np.int64)
It's taking your series, and then creating values
which is array([1, 50, 75], dtype=object)
and passing that to another function in pandas._libs
to apply your function f = lambda x: isinstance(x, np.int64)
另一方面,带有axis=1
的DataFrame.apply
可以按预期工作,因为当它定义values
时,它是通过values = self.values
On the other hand DataFrame.apply
with axis=1
works as expected, because when it defines values
it does so by values = self.values
See here, which gives you values = array([ 1, 50, 75], dtype=int64)
实际上,如果要将基础的熊猫Series.apply代码更改为values=self.values
,则将获得期望的输出.
In fact, if you were to change the underlying pandas Series.apply code to values=self.values
you would get the output you would expect.
这篇关于pandas int和np.int64中的奇怪isinstance行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!