当询问Timestamp列值是否具有类型时, pandas 会给出不正确的结果 [英] Pandas gives incorrect result when asking if Timestamp column values have attr astype
问题描述
使用包含 Timestamp
值的列,对于元素是否具有属性 astype
,我得到的结果不一致:
With a column containing Timestamp
values, I am getting inconsistent results about whether the elements have the attribute astype
:
In [30]: o.head().datetime.map(lambda x: hasattr(x, 'astype'))
Out[30]:
0 False
1 False
2 False
3 False
4 False
Name: datetime, dtype: bool
In [31]: map(lambda x: hasattr(x, 'astype'), o.head().datetime.values)
Out[31]: [True, True, True, True, True]
In [32]: o.datetime.dtype
Out[32]: dtype('<M8[ns]')
In [33]: o.datetime.head()
Out[33]:
0 2012-09-30 22:00:15.003000
1 2012-09-30 22:00:16.203000
2 2012-09-30 22:00:18.302000
3 2012-09-30 22:03:37.304000
4 2012-09-30 22:05:17.103000
Name: datetime, dtype: datetime64[ns]
如果我选择了第一个元素(或任何单个元素),并询问是否有attr astype
,I看到它是,我甚至可以转换为其他格式。
If I pick off the first element (or any single element) and ask if it has attr astype
, I see that it does, and I even can convert to other formats.
但是,如果我一次性输入到整个列,使用 Series.map
,我收到一条错误,声称 Timestamp
对象没有属性 astype
(虽然他们清楚地做到)。
But if I type to do this to the entire column in one go, with Series.map
, I get an error claiming that Timestamp
objects do not have the attribute astype
(though they clearly do).
如何使用Pandas将操作映射到列?这是一个已知的错误吗?
How can I achieve mapping the operation to the column with Pandas? Is this a known error?
版本:pandas 0.13.0,numpy 1.8
Version: pandas 0.13.0, numpy 1.8
/ strong>
Added
它似乎是大熊猫或numpy的某种隐式投射:
It appears to be some sort of implicit casting on the part of either pandas or numpy:
In [50]: hasattr(o.head().datetime[0], 'astype')
Out[50]: False
In [51]: hasattr(o.head().datetime.values[0], 'astype')
Out[51]: True
推荐答案
时间戳没有astype方法。但是numpy.datetime64的做法。
Timestamps do not have an astype method. But numpy.datetime64's do.
NDFrame.values
返回一个numpy数组
o .head()。datetime.values
返回一个numtype数组dtype numpy.datetime64
,这就是为什么
NDFrame.values
returns a numpy array.
o.head().datetime.values
returns a numpy array of dtype numpy.datetime64
, which is why
In [31]: map(lambda x: hasattr(x, 'astype'), o.head().datetime.values)
Out[31]: [True, True, True, True, True]
请注意, Series .__ iter __
是这样定义:
def __iter__(self):
if com.is_categorical_dtype(self.dtype):
return iter(self.values)
elif np.issubdtype(self.dtype, np.datetime64):
return (lib.Timestamp(x) for x in self.values)
elif np.issubdtype(self.dtype, np.timedelta64):
return (lib.Timedelta(x) for x in self.values)
else:
return iter(self.values)
因此,当系列的dtype为 np.datetime64
时,系列
上的迭代将返回Timestamps。这是隐式转换的地方。
So, when the dtype of the Series is np.datetime64
, iteration over the Series
returns Timestamps. This is where the implicit conversion takes place.
这篇关于当询问Timestamp列值是否具有类型时, pandas 会给出不正确的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!