在Hive中以逗号分隔值的列 [英] Column to comma separated value in Hive
问题描述
它已被问及SQL的答案(转换多行合并为一个逗号作为分隔符),上述任何方法都可以在Hive中使用,例如从这里去:
+ ------ + ------ +
| Col1 | Col2 |
+ ------ + ------ +
| a | 1 |
| a | 5 |
| a | 6 |
| b | 2 |
| b | 6 |
+ ------ + ------ +
对此:
+ ------ + ------- +
| Col1 | Col2 |
+ ------ + ------- +
| a | 1,5,6 |
| b | 2,6 |
+ ------ + ------- +
collect_set
可以实现你想要获得的东西。 此处是文档。所以你可以写下如下查询: SELECT Col1,collect_set(Col2)
FROM your_table
GROUP BY Col1;
但是,MySQL的 GROUP BY
和Hive的 collect_set
,同时 GROUP_CONCAT
也保留了结果数组中的重复项, collect_set
删除数组中出现的重复项。在你所显示的例子中, Col2
没有重复的组值,所以你可以继续使用它。
It's been asked and answered for SQL (Convert multiple rows into one with comma as separator), would any of the approaches mentioned work in Hive, e.g. to go from this:
+------+------+
| Col1 | Col2 |
+------+------+
| a | 1 |
| a | 5 |
| a | 6 |
| b | 2 |
| b | 6 |
+------+------+
to this:
+------+-------+
| Col1 | Col2 |
+------+-------+
| a | 1,5,6 |
| b | 2,6 |
+------+-------+
The aggregator function collect_set
can achieve what you are trying to get. Here is the documentation. So you can write a query like:
SELECT Col1, collect_set(Col2)
FROM your_table
GROUP BY Col1;
However, there is one striking difference between MySQL's GROUP BY
and Hive's collect_set
that while GROUP_CONCAT
also retains duplicates in the resulting array, collect_set
removes the duplicates occuring in the array. In the example shown by you there are no repeating group values for Col2
so you can go ahead and use it.
这篇关于在Hive中以逗号分隔值的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!