在Hive中以逗号分隔值的列 [英] Column to comma separated value in Hive

查看:1021
本文介绍了在Hive中以逗号分隔值的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

它已被问及SQL的答案(转换多行合并为一个逗号作为分隔符),上述任何方法都可以在Hive中使用,例如从这里去:

  + ------ + ------ + 
| Col1 | Col2 |
+ ------ + ------ +
| a | 1 |
| a | 5 |
| a | 6 |
| b | 2 |
| b | 6 |
+ ------ + ------ +

对此:

  + ------ + ------- + 
| Col1 | Col2 |
+ ------ + ------- +
| a | 1,5,6 |
| b | 2,6 |
+ ------ + ------- +


解决方案聚集函数 collect_set 可以实现你想要获得的东西。 此处是文档。所以你可以写下如下查询:

  SELECT Col1,collect_set(Col2)
FROM your_table
GROUP BY Col1;

但是,MySQL的 GROUP BY 和Hive的 collect_set ,同时 GROUP_CONCAT 也保留了结果数组中的重复项, collect_set 删除数组中出现的重复项。在你所显示的例子中, Col2 没有重复的组值,所以你可以继续使用它。


It's been asked and answered for SQL (Convert multiple rows into one with comma as separator), would any of the approaches mentioned work in Hive, e.g. to go from this:

+------+------+
| Col1 | Col2 |
+------+------+
| a    | 1    |
| a    | 5    |
| a    | 6    |
| b    | 2    |
| b    | 6    |
+------+------+

to this:

+------+-------+
| Col1 | Col2  |
+------+-------+
| a    | 1,5,6 |
| b    | 2,6   |
+------+-------+

解决方案

The aggregator function collect_set can achieve what you are trying to get. Here is the documentation. So you can write a query like:

SELECT Col1, collect_set(Col2)
FROM your_table
GROUP BY Col1;

However, there is one striking difference between MySQL's GROUP BY and Hive's collect_set that while GROUP_CONCAT also retains duplicates in the resulting array, collect_set removes the duplicates occuring in the array. In the example shown by you there are no repeating group values for Col2 so you can go ahead and use it.

这篇关于在Hive中以逗号分隔值的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆