Python/Numpy:vectorize和item中的类型转换问题 [英] Python/Numpy: problems with type conversion in vectorize and item

查看:95
本文介绍了Python/Numpy:vectorize和item中的类型转换问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个从数组的日期时间中提取值的函数.我希望该函数在Pandas DataFrame或numpy ndarray上进行操作.

I am writing a function to extract values from datetimes over arrays. I want the function to operate on a Pandas DataFrame or a numpy ndarray.

应该以与Python datetime属性相同的方式返回值,例如

The values should be returned in the same way as the Python datetime properties, e.g.

from datetime import datetime
dt = datetime(2016, 10, 12, 13)
dt.year
  => 2016
dt.second
  => 0

对于DataFrame来说,使用applymap()相当容易处理(尽管可能有更好的方法).我尝试使用vectorize()对numpy ndarrays使用相同的方法,但遇到了问题.最终我得到的不是我期望的值,而是一个非常大的整数,有时是负数.

For a DataFrame this is reasonably easy to handle using applymap() (although there may well be a better way). I tried the same approach for numpy ndarrays using vectorize(), and I'm running into problems. Instead of the values I was expecting, I end up with very large integers, sometimes negative.

起初这很令人困惑,但是我弄清楚了发生了什么:矢量化函数正在使用item而不是__get__从ndarray中获取值.这似乎会自动将每个datetime64对象转换为long:

This was pretty baffling at first, but I figured out what is happening: the vectorized function is using item instead of __get__ to get the values out of the ndarray. This seems to automatically convert each datetime64 object to a long:

nd[1][0]
  => numpy.datetime64('1986-01-15T12:00:00.000000000')
nd[1].item()
  => 506174400000000000L

长似乎是自纪元(1970-01-01T00:00:00)以来的纳秒数.沿着这条线的某个位置,这些值会转换为整数,并且它们会溢出,从而产生负数.

The long seems to be the number of nanoseconds since epoch (1970-01-01T00:00:00). Somewhere along the line the values are converted to integers and they overflow, hence the negative numbers.

这就是问题所在.请有人可以帮我解决这个问题吗?我唯一想到的就是手动进行转换,但这实际上意味着重新实现datetime模块的一部分.

So that's the problem. Please can someone help me fix it? The only thing I can think of is doing the conversion manually, but this would effectively mean reimplementing a chunk of the datetime module.

vectorize是否有不使用item()的替代方法?

Is there some alternative to vectorize that doesn't use item()?

谢谢!

最小代码示例:

## DataFrame works fine
import pandas as pd
from datetime import datetime

df = pd.DataFrame({'dts': [datetime(1970, 1, 1, 1), datetime(1986, 1, 15, 12),
                         datetime(2016, 7, 15, 23)]})
exp = pd.DataFrame({'dts': [1, 15, 15]})

df_func = lambda x: x.day    
out = df.applymap(df_func)

assert out.equals(exp)

## numpy ndarray is more difficult
from numpy import datetime64 as dt64, timedelta64 as td64, vectorize  # for brevity

# The unary function is a little more complex, especially for days and months where the minimum value is 1
nd_func = lambda x: int((dt64(x, 'D') - dt64(x, 'M') + td64(1, 'D')) / td64(1, 'D'))

nd = df.as_matrix()
exp = exp.as_matrix()
  => array([[ 1],
            [15],
            [15]])

# The function works as expected on a single element...
assert nd_func(nd[1][0]) == 15

# ...but not on an ndarray
nd_vect = vectorize(nd_func)
out = nd_vect(nd)
  => array([[    -105972749999999],
            [ 3546551532709551616],
            [-6338201187830896640]])

推荐答案

在Py3中,错误为OverflowError: Python int too large to convert to C long.

In [215]: f=np.vectorize(nd_func,otypes=[int])
In [216]: f(dts)
... 
OverflowError: Python int too large to convert to C long

但是如果我更改日期时间单位,则可以正常运行

but if I change the datetime units, it runs ok

In [217]: f(dts.astype('datetime64[ms]'))
Out[217]: array([ 1, 15, 15])

我们可以更深入地研究这个问题,但这似乎是最简单的解决方案.

We could dig into this in more depth, but this seems to be simplest solution.

请记住,vectorize是便捷功能;请注意,vectorize是便捷功能.它使得在多维上进行迭代变得更加容易.但是对于一维数组,基本上是

Keep in mind that vectorize is a convenience function; it makes iterating over multidimensions easier. But for a 1d array it is basically

np.array([nd_func(i) for i in dts])

但是请注意,我们不必使用迭代:

But note that we don't have to use iteration:

In [227]: (dts.astype('datetime64[D]') - dts.astype('datetime64[M]') + td64(1,'D')) / td64(1,'D').astype(int)
Out[227]: array([ 1, 15, 15], dtype='timedelta64[D]')

这篇关于Python/Numpy:vectorize和item中的类型转换问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆