最小的水桶,以适应元素 [英] Fewest buckets to fit in the elements
问题描述
我有一张桌子,上面有水桶和下面的元素.如果某个元素可以放入存储桶中,则在资格"列中为1例如:如果您查看下面的数据,则元素x可以容纳在存储区a,b,c中,而不能容纳在d和e中.
I have a table with buckets and elements like below. If an element can fit in a bucket it is 1 in the eligibility column For example: If you look at the data below, element x can fit in bucket-a,b,c and not in d and e
我想找到最少的存储桶来对元素进行分组.在这种情况下,存储桶c和d可以将所有元素归为两个存储桶.
I want to find the fewest buckets to group my elements. In this case, buckets c and d could group all the elements in just two buckets.
我能在bigquery中动态有效地做到这一点吗?原始数据并非如此简单.
Any idea if i can do this in bigquery dynamically and efficiently ? original data is not as simple as this.
with matrix as (
---element x
select "element-x" as element, "bucketa" bucket , 1 eligibilty
union all
select "element-x" as element, "bucketb" bucket , 1 eligibilty
union all
select "element-x" as element, "bucketc" bucket , 1 eligibilty
union all
select "element-x" as element, "bucketd" bucket , 0 eligibilty
union all
select "element-x" as element, "buckete" bucket , 0 eligibilty
union all
---element y
select "element-y" as element, "bucketa" bucket , 0 eligibilty
union all
select "element-y" as element, "bucketb" bucket , 0 eligibilty
union all
select "element-y" as element, "bucketc" bucket , 1 eligibilty
union all
select "element-y" as element, "bucketd" bucket , 0 eligibilty
union all
select "element-y" as element, "buckete" bucket , 0 eligibilty
union all
---element z
select "element-z" as element, "bucketa" bucket , 1 eligibilty
union all
select "element-z" as element, "bucketb" bucket , 0 eligibilty
union all
select "element-z" as element, "bucketc" bucket , 1 eligibilty
union all
select "element-z" as element, "bucketd" bucket , 0 eligibilty
union all
select "element-z" as element, "buckete" bucket , 0 eligibilty
union all
---element p
select "element-p" as element, "bucketa" bucket , 0 eligibilty
union all
select "element-p" as element, "bucketb" bucket , 0 eligibilty
union all
select "element-p" as element, "bucketc" bucket , 1 eligibilty
union all
select "element-p" as element, "bucketd" bucket , 0 eligibilty
union all
select "element-p" as element, "buckete" bucket , 0 eligibilty
union all
---element q
select "element-q" as element, "bucketa" bucket , 1 eligibilty
union all
select "element-q" as element, "bucketb" bucket , 0 eligibilty
union all
select "element-q" as element, "bucketc" bucket , 0 eligibilty
union all
select "element-q" as element, "bucketd" bucket , 1 eligibilty
union all
select "element-q" as element, "buckete" bucket , 0 eligibilty
union all
---element r
select "element-r" as element, "bucketa" bucket , 0 eligibilty
union all
select "element-r" as element, "bucketb" bucket , 1 eligibilty
union all
select "element-r" as element, "bucketc" bucket , 0 eligibilty
union all
select "element-r" as element, "bucketd" bucket , 1 eligibilty
union all
select "element-r" as element, "buckete" bucket , 1 eligibilty
)
推荐答案
下面应该可以工作
with buckets_elements as (
select array[struct(a), struct(b), struct(c), struct(d), struct(e)] buckets
from (
select
array_agg(if(bucket = 'bucketa' and eligibilty = 1, element, null) ignore nulls) a,
array_agg(if(bucket = 'bucketb' and eligibilty = 1, element, null) ignore nulls) b,
array_agg(if(bucket = 'bucketc' and eligibilty = 1, element, null) ignore nulls) c,
array_agg(if(bucket = 'bucketd' and eligibilty = 1, element, null) ignore nulls) d,
array_agg(if(bucket = 'buckete' and eligibilty = 1, element, null) ignore nulls) e
from matrix
)
), columns_names as (
select array_agg(bucket order by bucket) cols
from (select distinct bucket from matrix)
), columns_index as (
select generate_array(0, array_length(cols) - 1) as arr
from columns_names
), buckets_combinations as (
select
(select array_agg(
case when n & (1<<pos) <> 0 then arr[offset(pos)] end
ignore nulls)
from unnest(generate_array(0, array_length(arr) - 1)) pos
) as combo
from columns_index cross join
unnest(generate_array(1, cast(power(2, array_length(arr)) - 1 as int64))) n
)
select
array(select cols[offset(i)] from columns_names, unnest(combo) i) winners
from (
select combo,
rank() over(order by (select count(distinct el) from unnest(val) v, unnest(v.a) el) desc, array_length(combo)) as rnk
from (
select any_value(c).combo, array_agg(buckets[offset(i)]) val
from buckets_combinations c, unnest(combo) i, buckets_elements b
group by format('%t', c)
)
)
where rnk = 1
如果应用于您的问题中的样本数据,则输出为
if applied to sample data in y our question - output is
注意:我只是重复使用上一个问题的答案,只是更改/调整了 buckets_elements
和 columns_names
CTE以反映新的架构.其余的都完全相同:o)
Note: I simply reused answer for previous question and just changed / adjusted buckets_elements
and columns_names
CTEs to reflect new schema. All the rest is exactly the same :o)
这篇关于最小的水桶,以适应元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!