Hive：如何执行SELECT查询以使用HiveQL输出唯一主键？ [英] Hive: How to do a SELECT query to output a unique primary key using HiveQL?

查看：1473 发布时间：2018/5/31 20:25:43 select hadoop distinct hive

本文介绍了Hive：如何执行SELECT查询以使用HiveQL输出唯一主键？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下模式数据集，我想将其转换为可导出到SQL的表。我正在使用 HIVE 。输入如下

  call_id，stat1，stat2，stat3 
 1，a，b，c，
 2，x，y，z，
 3，d，e，f，
 1，j，k，l，

输出表需要有 call_id 作为主键，因此它需要是唯一的。输出模式应该是

  call_id，stat2，stat3，
 1，b，c或（1，k ，l）
 2，y，z，
 3，e，f，

<问题是，当我在 HIVE 查询中使用关键字 DISTINCT 时， DISTINCT 适用于所有组合的柱。我只想将DISTINCT操作应用于 call_id 。

  SELECT LINE DISTINCT（call_id），stat2，stat3 from intable;

然而，这在 HIVE （我不熟悉SQL）。

唯一合法的查询似乎是

SELECT DISTINCT来自intable的call_id，stat2，stat3;
但是，这会返回多行，同样的 call_id 因为其他列是不同的，整个行是不同的。

注意：a，b，c，x，y，z之间没有算术关系，等等。所以任何平均或求和的技巧都是不可行的。

任何想法我可以做到这一点？

解决方案
一个简单的想法，不是最好的，但会完成工作 -

$ b
hive> create table temp1（int， b字符串）;
$ b hive>插入覆盖表temp1

select call_id，max（concat（stat1，' |'，stat2，'|'，stat3））from intable group by call_id;

hive>插入覆盖表intable

从temp1中选择一个split（b，'|'）[0]，split（b，'|'）[1]，split（b，'|'）[2];

I have the following schema dataset which i want to transform into a table that can be exported to SQL. I am using HIVE. Input as follows
call_id,stat1,stat2,stat3 1,a,b,c, 2,x,y,z, 3,d,e,f, 1,j,k,l,
The output table needs to have call_id as its primary key so it needs to be unique. The output schema should be
call_id,stat2,stat3, 1,b,c, or (1,k,l) 2,y,z, 3,e,f,
The problem is that when i use the keyword DISTINCT in the HIVE query, the DISTINCT applies to the all the colums combined. I want to apply the DISTINCT operation only to the call_id. Something on the lines of
SELECT DISTINCT(call_id), stat2,stat3 from intable;
However this is not valid in HIVE(I am not well-versed in SQL either).

The only legal query seems to be
SELECT DISTINCT call_id, stat2,stat3 from intable;
But this returns multiple rows with same call_id as the other columns are different and the row on the whole is distinct.

NOTE: There is no arithmetic relation between a,b,c,x,y,z, etc. So any trick of averaging or summing is not viable.

Any ideas how i can do this?
解决方案
One quick idea,not the best one, but will do the work-

hive>create table temp1(a int,b string);

hive>insert overwrite table temp1

select call_id,max(concat(stat1,'|',stat2,'|',stat3)) from intable group by call_id;

hive>insert overwrite table intable

select a,split(b,'|')[0],split(b,'|')[1],split(b,'|')[2] from temp1;

这篇关于Hive：如何执行SELECT查询以使用HiveQL输出唯一主键？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Hive：如何执行SELECT查询以使用HiveQL输出唯一主键？ [英] Hive: How to do a SELECT query to output a unique primary key using HiveQL?

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

Hive：如何执行SELECT查询以使用HiveQL输出唯一主键？ [英] Hive: How to do a SELECT query to output a unique primary key using HiveQL?

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭