在Redshift中从组中选择一个随机属性 [英] Pick a random attribute from group in Redshift
问题描述
我有一个表格中的数据集.
I have a data set in the form.
id | attribute
-----------------
1 | a
2 | b
2 | a
2 | a
3 | c
所需的输出:
attribute| num
-------------------
a | 1
b,a | 1
c | 1
在MySQL中,我将使用:
In MySQL, I would use:
select attribute, count(*) num
from
(select id, group_concat(distinct attribute) attribute from dataset group by id) as subquery
group by attribute;
我不确定是否可以在Redshift中完成此操作,因为它不支持group_concat或任何psql组聚合函数,例如array_agg()或string_agg().请参阅此问题.
I am not sure this can be done in Redshift because it does not support group_concat or any psql group aggregate functions like array_agg() or string_agg(). See this question.
一个可行的替代解决方案是,如果我有办法从每个组中选择一个随机属性,而不是group_concat.这在Redshift中如何运作?
An alternate solution that would work is if there was a way for me to pick a random attribute from each group instead of group_concat. How can this work in Redshift?
推荐答案
受Masashi启发,此解决方案更简单,可以从Redshift中的一个组中选择一个随机元素.
This solution, inspired by Masas is simpler and accomplishes selecting a random element from a group in Redshift.
SELECT id, first_value as attribute
FROM(SELECT id, FIRST_VALUE(attribute)
OVER(PARTITION BY id ORDER BY random()
ROWS BETWEEN unbounded preceding AND unbounded following)
FROM dataset)
GROUP BY id, attribute ORDER BY id;
这篇关于在Redshift中从组中选择一个随机属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!