如何在 Hive 中记录 created_at 和 updated_at 时间戳? [英] How to record created_at and updated_at timestamps in Hive?

查看:31
本文介绍了如何在 Hive 中记录 created_at 和 updated_at 时间戳?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

MySQL 可以自动记录 created_at 和 updated_at 时间戳. Hive 是否提供类似的机制?如果没有,实现此功能的最佳方法是什么?

MySQL can automatically record created_at and updated_at timestamps. Does Hive provide similar mechanisms? If not, what would be the best way to achieve this functionality?

推荐答案

Hive 不提供这种机制.您可以通过在您的选择中使用 UDF 来实现这一点:from_unixtime(unix_timestamp()) as created_at.请注意,这将在每个映射器或化简器中执行,并且可能返回不同的值.如果所有数据集需要相同的值(对于 1.2.0 之前的 Hive 版本),请将变量传递给脚本并在其中使用它作为:'${hiveconf:created_at}' as created_at

Hive does not provide such mechanism. You can achieve this by using UDF in your select: from_unixtime(unix_timestamp()) as created_at. Note this will be executed in each mapper or reducer and may return different values. If you need the same value for all the dataset (for Hive version before 1.2.0), pass the variable to the script and use it inside as: '${hiveconf:created_at}' as created_at

更新:current_timestamp 返回查询评估开始时的当前时间戳(从 Hive 1.2.0 开始).在同一查询中对 current_timestamp 的所有调用都返回相同的值.unix_timestamp() 以秒为单位获取当前的 Unix 时间戳.此函数是非确定性的,并且会阻止对查询进行适当的优化 - 自 2.0 以来已弃用该函数以支持 CURRENT_TIMESTAMP 常量.所以,它不是一个函数,它是一个常数!请参阅此文档:https://cwiki.apache.org/confluence/display/Hive/语言手册+UDF

Update: current_timestamp returns the current timestamp at the start of query evaluation (as of Hive 1.2.0). All calls of current_timestamp within the same query return the same value. unix_timestamp() Gets current Unix timestamp in seconds. This function is non-deterministic and prevents proper optimization of queries - this has been deprecated since 2.0 in favour of CURRENT_TIMESTAMP constant. So, it's not a function, it's a constant! See this docs: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

对于 hive 查询,当您重写表或分区或插入时,CURRENT_TIMESTAMP 更可取,因为无论如何都会重写所有文件,而不是记录,因此 created_at 时间戳应该相同.

For hive queries CURRENT_TIMESTAMP is preferable when you rewrite tables or partitions or insert into because all the file(s) anyway are being rewritten, not records, therefore the created_at timestamp should be the same.

这篇关于如何在 Hive 中记录 created_at 和 updated_at 时间戳?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆