Apache-Drill 不理解 Pandas datetime64[ns] [英] Apache-Drill doesn't understand Pandas datetime64[ns]

查看:88
本文介绍了Apache-Drill 不理解 Pandas datetime64[ns]的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 PyarrowPyarrow.Parquet 以及 Pandas.当我将 Pandas datetime64[ns] 系列发送到 Parquet 文件并通过钻取查询再次加载它时,查询显示一个整数,如:1467331200000000,这似乎不是 UNIX 时间戳.

I'm using Pyarrow, Pyarrow.Parquet as well as Pandas. When I send a Pandas datetime64[ns] series to a Parquet file and load it again via a drill query, the query shows an Integer like: 1467331200000000 which seems to be something else than a UNIX timestamp.

查询如下所示:

SELECT workspace.id-column AS id-column,workspace.date-column AS date-column

当我再次在 Python 中打开该文件时,它正确加载并且仍然具有 datetime64[ns] 类型.

When I open that file within Python again, it loads correctly and still has its datetime64[ns] type.

知道出了什么问题以及如何解决这个问题吗?我希望此值显示为常规日期.

Any idea what's going wrong and how to solve this? I want this value being shown as a regular date.

推荐答案

好的,我几天前找到了一个解决方案,我想分享一下.我想我最初错过了一些东西.在将数据帧发送到 Parquet 以便能够在 Drill 中免费打开它之前,向下转换到 [ms] 以及允许截断时间戳非常重要:

Ok, I found a solution some days ago which I would like to share. I think I initially missed something. It's very important to downcast to [ms] as well as allowing truncating timestamps before sending the dataframe to Parquet for becoming able to open it issue free in Drill:

pq.write_table(table, rf'{name}.parquet',
           coerce_timestamps='ms',
           allow_truncated_timestamps=True)

当我在 Drill 中定义视图时,我可以根据需要将该列转换为日期或时间戳.

When I define a view in Drill I can cast that column as date or timestamp as required.

这篇关于Apache-Drill 不理解 Pandas datetime64[ns]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆