Hive时间窗口函数的bug [英] Hive time window function's bug

查看:245
本文介绍了Hive时间窗口函数的bug的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个名为gmv_active_mem_monthly的表。整个行可以在这里看到:

I have a table named gmv_active_mem_monthly. The whole rows can been seen here:

month   gmv_monthly active_member_monthly
201612  231657626042    2602064
201611  373576915733    3498039
201610  367824193757    3648708
201609  356167649082    3686007
201608  383362147243    3998595
201607  383828659139    3917252
201606  332929299345    3627298
201605  323084120955    3579938
201604  280834688208    3293682
201603  282180201106    3316420
201602  246386923468    3097107
201601  261355415707    3186347
201512  273860930491    3071105
201511  246606316046    2981534
201510  237766306308    2873558
201509  160390583711    2267418
201508  124370765573    2002018
201507  110236706032    1855539
201506  84844225170 1467889
201505  60651906632 1180800
201504  46808796126 917681
201503  12498656329 427529
201502  4918371362  190932
201501  2824293727  129203

I在配置单元中运行一个简单的代码:

I run a simple code in hive:

select  month,
        sum(gmv_monthly) over
        (
            order by  "month"
            rows      between 12 preceding and 1 preceding
        ) as total_gmv,
        sum(active_member_monthly) over
        (
            order by  "month"
            rows      between 12 preceding and 1 preceding
        ) as total_active_mem

from    novaya.gmv_active_mem_monthly 
;

但是结果是错误的,而我在另一个数据集上使用相同的代码是正确的。
上面数据集的结果是:

But the result is total wrong while I use the same code on another dataset it is right. The result on the dataset above is :

month   total_gmv   total_active_mem
201501  NULL    NULL
201502  2824293727  129203
201503  7742665089  320135
201504  20241321418 747664
201505  67050117544 1665345
201506  127702024176    2846145
201507  212546249346    4314034
201508  322782955378    6169573
201509  447153720951    8171591
201510  607544304662    10439009
201511  845310610970    13312567
201512  1091916927016   16294101
201601  1365777857507   19365206
201602  1624308979487   22422350
201603  1865777531593   25328525
201604  2135459076370   28217416
201605  2369484968452   30593417
201606  2631917182775   32992555
201607  2880002256950   35151964
201608  3153594210057   37213677
201609  3412585591727   39210254
201610  3608362657098   40628843
201611  3738420544547   41403993
201612  3865391144234   41920498

我们可以检查201602年的1624308979487减去201601年的1365777857507并不等于2016年的gmv_active_mem_monthly值。
那么代码有什么问题?代码在另一个数据集上运行完美,没有像这样的错误。

We can check that 1624308979487 from 201602 minus 1365777857507 from 201601 is not equal to 201601's value in gmv_active_mem_monthly. So what's wrong with the code? The code runs perfect on another dataset without error like this.

推荐答案

没有问题。结果是正确的。

与1个月不同,它与 2个月,即每个范围的边缘之一相关。

There is no problem. The results are correct.
The difference is not not with 1 month, it is with 2 months, one of each edge of the range.

201502  4,918,371,362          <-- This value goes only with 201601  
201503  12,498,656,329         
201504  46,808,796,126 
201505  60,651,906,632 
201506  84,844,225,170 
201507  110,236,706,032 
201508  124,370,765,573 
201509  160,390,583,711 
201510  237,766,306,308 
201511  246,606,316,046 
201512  273,860,930,491 
201601  261,355,415,707        <-- This value goes only with 201602
201602  

这篇关于Hive时间窗口函数的bug的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆