查询分割数据 [英] query to divide data

查看:177
本文介绍了查询分割数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有两列id和monthid。



我要找的输出是将年份从月份Id分成季度。输出栏应该从四分之一开始。如果id是活动的,那么输出应该是1 else 0。如果id在第一季度出现(例如:只有1),那么输出仍然是1.

像这样:

  id月份
-------------------- ---------------
100 2012-03-01 00:00:00.0
100 2015-09-01 00:00:00.0
100 2016-10-01 00:00:00.0
100 2015-11-01 00:00:00.0
100 2014-01-01 00:00:00.0
100 2013-04-01 00:00:00.0
100 2014-12-01 00:00:00.0
100 2015-02-01 00:00:00.0
100 2014-06-01 00:00:00.0
100 2013-01-01 00:00:00.0
100 2014-05-01 00:00:00.0
100 2016-05-01 00:00:00.0
100 2013-07-01 00:00:00.0

结果应该类似于

  ID YEAR QTR输出(1或0)
------------------- -------------------------------
100 2012 1 1
100 2012 2 0
100 2012 3 0
100 2012 4 0
100 20 13 1 1
100 2013 2 1
100 2013 3 1
100 2013 4 0

以下是我尝试的,但不会返回预期结果。请帮助我实现这一点。我希望当输出为0时。

 选择a.id,a.year,a .month,
CASE WHEN a.month BETWEEN 1 AND 4 THEN 1
ELSE 0 END作为输出
from
(select id,trim(substring(claim_month_id,1,4) )作为年份,(INT((MONTH(monthid)-1)/ 3)+1)作为来自测试的月份)a
group by a.id,a.year,a.month

任何帮助将不胜感激。

解决方案

@Ani;在Hive中没有分层查询来创建四个季度(1,2,3,4),因此我为它创建了一个小表。然后,我得到ims_patient_activity_diagnosis表中存在的所有patient_id,year和month。最后,我在所有可能的患者身份,年份和季度(1,2,3,4)方面做了正确的加入;如果该ID或年份或季度不存在于正确的加入中,则该ID,年份和季度没有任何活动。我为这些行分配activity = 0。
我也插入了患者id = 200来测试表中是否有更多的患者id。希望这可以帮助。谢谢。

  create table dbo.qtrs(month int); 
插入qtrs值(1),(2),(3),(4);

选择DISTINCT NVL(ims.id,qtr.id)作为patient_id,
qtr.year as year,
qtr.month as month,
CASE WHEN ims .id> 0 THEN 1 ELSE 0 END作为活动
from sandbox_grwi.ims_patient_activity_diagnosis ims
right join(选择不同的ims.id,YEAR(ims.month_dt)作为年份,qtrs.month from sandbox_grwi.ims_patient_activity_diagnosis ims join dbo。 qtrs qtrs)qtr
on(ims.id = qtr.id和YEAR(ims.month_dt)= qtr.year和INT((MONTH(month_dt)-1)/ 3)+ 1 = qtr.month)
根据patient_id,year,month排序;

示例结果:
p_id年份活动
100 2012 1 1
100 2012 2 0
100 2012 3 0
100 2012 4 0
100 2013 1 1
100 2013 2 1
100 2013 3 1
100 2013 4 0
100 2014 1 1
100 2014 2 1
100 2014 3 0
100 2014 4 1
2015 2015 1 1
100 2015 2 0
100 2015 3 1
2015 2015 4 1
100 2016 1 0
100 2016 2 1
100 2016 3 0
100 2016 4 1
200 2012 1 1
200 2012 2 0
200 2012 3 0
200 2012 4 0
200 2013 1 0
200 2013 2 1
200 2013 3 0
200 2013 4 0


附加示例数据:
插入到sandbox_gr wi.ims_patient_activity_diagnosis值
(200,'2012-03-01'),
(200,'2013-04-01');


we have two columns id and monthid.

The output what I'm looking for is to divide year from month Id based on quarter . The output column should be from quarter. If id is active output should be 1 else 0 .If id comes in any of the 1st quarter (eg:only 1) the output is still 1 .

Like this:

id           month
-----------------------------------
100   2012-03-01 00:00:00.0
100   2015-09-01 00:00:00.0
100   2016-10-01 00:00:00.0
100   2015-11-01 00:00:00.0
100   2014-01-01 00:00:00.0
100   2013-04-01 00:00:00.0
100   2014-12-01 00:00:00.0
100   2015-02-01 00:00:00.0
100   2014-06-01 00:00:00.0
100   2013-01-01 00:00:00.0
100   2014-05-01 00:00:00.0
100   2016-05-01 00:00:00.0
100   2013-07-01 00:00:00.0

result should be something like

ID    YEAR     QTR      output (1 or 0)
--------------------------------------------------
100   2012      1          1
100   2012      2          0
100   2012      3          0
100   2012      4          0
100   2013      1          1
100   2013      2          1
100   2013      3          1
100   2013      4          0

Below is the one I tried but it doesn't return the expected results. Please help me achieve this.I want when the ouput is 0 as well.

select a.id,a.year,a.month,
CASE WHEN a.month BETWEEN 1 AND 4 THEN 1 
 ELSE 0 END as output
from
(select id,trim(substring(claim_month_id,1,4)) as year,(INT((MONTH(monthid)-1)/3)+1) as month from test) a
group by a.id,a.year,a.month

Any help would be appreciated.

解决方案

@Ani; there is no hierarchical query in Hive to create four quarters (1,2,3,4) so I create a small table for it. Then I get all patient_id, year and month that exists in ims_patient_activity_diagnosis table. Finally, I did a right join on all possible patient id, year and quarters (1,2,3,4); If the id or year or quarter does not exists in the right join, then there is no activity for that id, year and quarter. I assign activity=0 for those rows. I also inserted patient id=200 to test if there are more patient id in the table. Hope this helps. Thanks.

create table dbo.qtrs(month int);
insert into qtrs  values (1),(2),(3),(4);

select DISTINCT NVL(ims.id, qtr.id) as patient_id,
qtr.year as year,
qtr.month as month,
CASE WHEN ims.id > 0 THEN 1 ELSE 0 END as activity  
from sandbox_grwi.ims_patient_activity_diagnosis ims
right join (select distinct ims.id,YEAR(ims.month_dt) as year,qtrs.month from sandbox_grwi.ims_patient_activity_diagnosis ims join dbo.qtrs qtrs) qtr 
on (ims.id=qtr.id and YEAR(ims.month_dt)=qtr.year and INT((MONTH(month_dt)-1)/3)+1=qtr.month)
sort by patient_id, year, month;

Sample Result:
p_id    year    month   activity
100     2012    1       1
100     2012    2       0
100     2012    3       0
100     2012    4       0
100     2013    1       1
100     2013    2       1
100     2013    3       1
100     2013    4       0
100     2014    1       1
100     2014    2       1
100     2014    3       0
100     2014    4       1
100     2015    1       1
100     2015    2       0
100     2015    3       1
100     2015    4       1
100     2016    1       0
100     2016    2       1
100     2016    3       0
100     2016    4       1
200     2012    1       1
200     2012    2       0
200     2012    3       0
200     2012    4       0
200     2013    1       0
200     2013    2       1
200     2013    3       0
200     2013    4       0


additional sample data:
insert into sandbox_grwi.ims_patient_activity_diagnosis values
(200, '2012-03-01'), 
(200, '2013-04-01'); 

这篇关于查询分割数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆