计算蜂巢中字符串类型的两列之间的时差,而无需更改数据类型字符串 [英] Calculate time difference between two columns of string type in hive without changing the data type string

查看:93
本文介绍了计算蜂巢中字符串类型的两列之间的时差,而无需更改数据类型字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试计算具有字符串数据类型的行的两列之间的时间差.如果它们之间的时间差小于2小时,则选择该行的第一列,否则,如果时间差大于2小时,则选择该行的第二列.可以通过将列转换为日期时间格式来完成,但是我希望结果仅在字符串中.我怎样才能做到这一点?数据如下:

I am trying to calculate the time difference between two columns of a row which are of string data type. If the time difference between them is less than 2 hours then select the first column of that row else if the time difference is greater than 2 hours then select the second column of that row. It can be done by converting the columns to datetime format, but I want the result to be in string only. How can I do that? The data looks like this:

col1(字符串类型)
2018-07-16 02:23:00
2018-07-26 12:26:00
2018-07-26 15:32:00

col1(string type)
2018-07-16 02:23:00
2018-07-26 12:26:00
2018-07-26 15:32:00

col2(字符串类型)
2018-07-16 02:36:00
2018-07-26 14:29:00
2018-07-27 15:38:00

col2(string type)
2018-07-16 02:36:00
2018-07-26 14:29:00
2018-07-27 15:38:00

推荐答案

我认为您无需将列转换为日期时间格式,因为您的案例中的数据已经排序(yyyy-MM-dd hh:mm :ss).您只需要将所有数字都放入一个字符串(yyyyMMddhhmmss)中,然后可以应用大于或小于2小时(此处为20000,因为小时后面是mmss)的选择.通过查看您的示例(假设col2> col1),此查询将起作用:

I think you don't need to convert the columns to datetime format, since the data in your case is already ordered (yyyy-MM-dd hh:mm:ss). You just need to take all the digits and take it into one string (yyyyMMddhhmmss) then you can apply your selection which is bigger or smaller than 2 hours (here 20000 since the hour is followed by mmss). By looking at your example (assuming col2 > col1), this query would work:

SELECT case when regexp_replace(col2,'[^0-9]', '')-regexp_replace(col1,'[^0-9]', '') < 20000 then col1 else col2 end as col3 from your_table;

这篇关于计算蜂巢中字符串类型的两列之间的时差,而无需更改数据类型字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆