蜂巢:根据特定列中的唯一值填充其他列 [英] Hive: Populate other columns based on unique value in a particular column

查看:40
本文介绍了蜂巢:根据特定列中的唯一值填充其他列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在蜂巢中有两个表,如下所述在Hive中

I have a two tables in hive as mentioned below in Hive

表1:

id   name value
1    abc  stack
3    abc  overflow
4    abc  foo
6    abc  bar

表2:

id   name value       
5    xyz  overflow       
9    xyz  stackoverflow 
3    xyz  foo
23   xyz  bar

我需要在不考虑id和name列的情况下计算value列.

I need to take the count of value column without considering the id and name column.

预期输出为

id name value
1  abc  stack
9  xyz  stackoverflow

我试过了,可以在其他数据库中工作,但不能在蜂巢中工作

I tried this and works in other databases but not in hive

select id,name,value from
 (SELECT id,name,value FROM table1  
   UNION ALL 
  SELECT id,name,value FROM table2) t 
 group by value having count(value) = 1;

Hive希望使用如下所述的group by子句.

Hive expects group by clause like mentioned below.

select id,name,value from
  (SELECT id,name,value FROM table1  
    UNION ALL 
  SELECT id,name,value FROM table2) t 
 group by id,name,value having count(value) = 1;

并给出输出

id   name value
1    abc  stack
3    abc  overflow
4    abc  foo
6    abc  bar
5    xyz  overflow       
9    xyz  stackoverflow 
3    xyz  foo
23   xyz  bar

我们将必须在select子句中提供要使用的组中的所有列.但是当我给它的时候考虑了所有的列,结果却与预期的不同.

We will have to give all the columns in group by which we are using in select clause. but when i give it considers all the columns and the result is different than expected.

推荐答案

计算解析 count(*)over(按值划分).用数据示例进行测试:

Calculate analytic count(*) over(partition by value). Testing with your data example:

with 

table1 as (
select stack (4,
              1,'abc','stack',
              3,'abc','overflow',
              4,'abc','foo',
              6,'abc','bar'
             ) as (id, name, value)
),

table2 as (
select stack (4,
              5,  'xyz','overflow',      
              9,  'xyz','stackoverflow',
              3,  'xyz','foo',
              23, 'xyz','bar'
             ) as (id, name, value)
)

select id, name, value
from(
select id, name, value, count(*) over(partition by value) value_cnt
 from
(SELECT id,name,value FROM table1  
  UNION ALL 
 SELECT id,name,value FROM table2) s
)s where value_cnt=1;

结果:

OK
id      name    value
1       abc     stack
9       xyz     stackoverflow
Time taken: 55.423 seconds, Fetched: 2 row(s)

这篇关于蜂巢:根据特定列中的唯一值填充其他列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆