如何在Hive中记录created_at和updated_at时间戳? [英] How to record created_at and updated_at timestamps in Hive?

查看:164
本文介绍了如何在Hive中记录created_at和updated_at时间戳?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

MySQL可以自动记录created_at和updated_at时间戳。 Hive是否提供类似的机制?如果没有,那么实现此功能的最佳方法是什么? 解决方案

Hive不提供这种机制。您可以通过在您的select中使用UDF来实现此目的: from_unixtime(unix_timestamp())为created_at 。请注意,这将在每个映射器或缩减器中执行,并可能返回不同的值。如果您需要所有数据集的相同值(对于1.2.0之前的Hive版本),请将该变量传递给脚本,并将其作为:'$ {hiveconf:created_at}'as created_at $ b

更新 current_timestamp 会在开始时返回当前时间戳的查询评估(截至Hive 1.2.0 )。同一查询中current_timestamp的所有调用返回相同的值。 unix_timestamp()以秒为单位获取当前的Unix时间戳。这个函数是非确定性的,并且阻止查询的正确优化 - 自2.0开始,这个函数已经被弃用了,并且支持CURRENT_TIMESTAMP常量。所以,这不是一个功能,它是一个常数!
查看此文档: https://cwiki.apache.org/ confluence / display / Hive / LanguageManual + UDF



对于配置单元查询,当您重写表或分区或插入时,CURRENT_TIMESTAMP更可取,因为所有文件无论如何正在重写,而不是记录,因此 created_at 时间戳应该是相同的。


MySQL can automatically record created_at and updated_at timestamps. Does Hive provide similar mechanisms? If not, what would be the best way to achieve this functionality?

解决方案

Hive does not provide such mechanism. You can achieve this by using UDF in your select: from_unixtime(unix_timestamp()) as created_at. Note this will be executed in each mapper or reducer and may return different values. If you need the same value for all the dataset (for Hive version before 1.2.0), pass the variable to the script and use it inside as: '${hiveconf:created_at}' as created_at

Update: current_timestamp returns the current timestamp at the start of query evaluation (as of Hive 1.2.0). All calls of current_timestamp within the same query return the same value. unix_timestamp() Gets current Unix timestamp in seconds. This function is non-deterministic and prevents proper optimization of queries - this has been deprecated since 2.0 in favour of CURRENT_TIMESTAMP constant. So, it's not a function, it's a constant! See this docs: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

For hive queries CURRENT_TIMESTAMP is preferable when you rewrite tables or partitions or insert into because all the file(s) anyway are being rewritten, not records, therefore the created_at timestamp should be the same.

这篇关于如何在Hive中记录created_at和updated_at时间戳?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆