如何使用Presto将时间列分为5秒间隔并计数行? [英] How to group time column into 5 second intervals and count rows using Presto?

查看:465
本文介绍了如何使用Presto将时间列分为5秒间隔并计数行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Presto和Zeppelin.有很多原始数据. 我必须总结这些数据.

I am using Presto and Zeppelin. There are a lot of raw datas. I have to summarize those datas.

我想每5秒分组一次.

serviceType        logType     date
------------------------------------------------------
service1           log1        2017-10-24 23:00:23.206
service1           log1        2017-10-24 23:00:23.207
service1           log1        2017-10-24 23:00:25.206
service2           log1        2017-10-24 23:00:24.206
service1           log2        2017-10-24 23:00:27.206
service1           log2        2017-10-24 23:00:29.302

然后结果

serviceType        logType     date                       cnt
--------------------------------------------------------------
service1           log1        2017-10-24 23:00:20          2
service2           log1        2017-10-24 23:00:20          1
service1           log1        2017-10-24 23:00:25          1
service1           log2        2017-10-24 23:00:25          2

首先,我必须将存储的数据迁移到新表中.

first, I have to migrate stored datas to new tables.

第二,我必须对数据进行分组并实时保存到新表中.

second, I have to group datas and save to the new table realtime.

很难编写sql脚本.

请帮助我.

我必须使用python解释器吗?

Do I have to use python interpreter?

推荐答案

您可以

  1. 使用date_trunc丢弃timestamp的毫秒部分
  2. 您可以使用ts - interval '1' second * (second(ts) % 5)将不带毫秒部分的timestamp舍入为5秒
  1. discard millisecond part of a timestamp with date_trunc
  2. you can round a timestamp without millisecond part to 5 seconds with ts - interval '1' second * (second(ts) % 5)

将它们放在一起的示例:

Example putting this together:

presto> SELECT ts_rounded, count(*)
     -> FROM (
     ->     SELECT date_trunc('second', ts) - interval '1' second * (second(ts) % 5) AS ts_rounded
     ->     FROM (VALUES timestamp '2017-10-24 23:01:20.206',
     ->         timestamp '2017-10-24 23:01:23.206',
     ->         timestamp '2017-10-24 23:01:23.207',
     ->         timestamp '2017-10-24 23:01:26.206') AS t(ts)
     -> )
     -> GROUP BY ts_rounded ORDER BY ts_rounded;
       ts_rounded        | _col1
-------------------------+-------
 2017-10-24 23:01:20.000 |     3
 2017-10-24 23:01:25.000 |     1
(2 rows)

这篇关于如何使用Presto将时间列分为5秒间隔并计数行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆