具有特定排除条件的Hive查询 [英] Hive query with certain specific exclude conditions

查看:293
本文介绍了具有特定排除条件的Hive查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试构建一个只包含以下功能或这些功能组合的配置单元查询。例如,这些功能包括:

name =summary



name =details


$ b

name1 =车辆统计数据

name1 =accelerometer

<我必须计算严格遵守上述条件的客户数量。例如,在下表中,顾客快乐不应该被计数,因为他的名字和车辆状态同时具有总结和详细信息,并且名称中还包括b名称1中的加速度计。



类似地,客户Lan不应该被计数,因为他在name1中额外完成了超速操作,这并不符合上述条件。

 客户姓名姓名1 
快速汇总车辆统计数据
快速细节加速度计
快速支出加速
Lan摘要车辆统计
Lan细节加速度计
Lan细节加速
Hana细节加速度计
Hana摘要车辆统计

下表的计数必须为1,因为只有1名客户(Hana)在名称和车辆状态中仅完成摘要和详细信息 和
accelerometerin name1。



这是我目前的查询:

<$从表1中选择名称,名称1,计数(distinct(customername))

其中date_time介于2017-01-01 00:00:00和2017 -01-10 00:00:00
按名称分组,名称1
在('summary','detai ls')
或name1('vehicle stats','accelerometer')

任何建议会很棒!!

解决方案

您也可以使用 collect_set

 从表格1中选择客户名称

where date_time between 2017-01-01 00:00:00和2017-01-10 00:00:00
group by customername
concat_ws(',',collect_set(name))='summary ,细节'
和concat_ws(',',collect_set(name1))='车辆状态,加速计'

您必须对 collect_set
的连接输出进行排序以进行比较。


I am trying to build a hive query that does only the below features or a combination of these features. For example, the features include

name = "summary"

name = "details"

name1 = "vehicle stats"

name1 = "accelerometer"

I have to count the number of customers who strictly follow the above conditions. For example, in the below table, customer "Joy" should not be counted because he has additionally done "expenses" in name even though he has both "summary" and "details" in name and "vehicle stats" and "accelerometer" in name1.

Similarly, customer "Lan" should not been counted as he has additionally done "speeding" in name1 which is not in the above conditions.

    customername    name        name1
    Joy             summary     vehicle stats
    Joy             details     accelerometer
    Joy             expenses    speeding
    Lan             summary     vehicle stats
    Lan             details     accelerometer   
    Lan             details     speeding
    Hana            details     accelerometer
    Hana            summary     vehicle stats

Count for the below table has to be 1 as there is only 1 customer (Hana) who has done only "summary" and "details" in name and "vehicle stats" and "accelerometer" in name1.

This is the query that I currently have:

    select name, name1, count(distinct(customername))
    from table1
    where date_time between "2017-01-01 00:00:00" and "2017-01-10 00:00:00"
    group by name, name1
    having name in ('summary', 'details') 
    or name1 in ('vehicle stats', 'accelerometer')

Any suggestions would be great!!

解决方案

You can also use collect_set to check only for the specified entries in those columns.

select customername
from table1
where date_time between "2017-01-01 00:00:00" and "2017-01-10 00:00:00"
group by customername
having concat_ws(',',collect_set(name)) = 'summary,details'
and concat_ws(',',collect_set(name1)) = 'vehicle stats,accelerometer'

You have to sort the concatenated output from collect_set for comparison.

这篇关于具有特定排除条件的Hive查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆