在 bigquery 中装袋元素的桶数最少 [英] Fewest number of buckets to bag elements in bigquery

查看:15
本文介绍了在 bigquery 中装袋元素的桶数最少的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有桶和元素的矩阵,如下所示.如果一个元素可以放入一个桶中,那么它在相应的单元格中为 1

I have a matrix with buckets and elements like below. If an element can fit in a bucket it is 1 in the corresponding cell

例如:如果您查看图像,元素 x 可以放入桶 a、b、c 而不是 d 和 e

For example: If you look at the image, element x can fit in bucket-a,b,c and not in d and e

我想找到最少的桶来分组我的元素.在这种情况下,桶 c 和 d 可以将所有元素分组到两个桶中.

I want to find the fewest buckets to group my elements. In this case, buckets c and d could group all the elements in just two buckets.

知道我是否可以在 bigquery 中动态有效地执行此操作吗?原始数据没有这么简单.

Any idea if i can do this in bigquery dynamically and efficiently ? original data is not as simple as this.

 select "element-x" as element , 1 as bucketa, 1 as bucketb, 1 as bucketc, 0 as bucketd, 0 as buckete
 union all
 select "element-y" as element , 0 as bucketa, 0 as bucketb, 1 as bucketc, 0 as bucketd, 0 as buckete
 union all
 select "element-z" as element , 1 as bucketa, 0 as bucketb, 1 as bucketc, 0 as bucketd, 0 as buckete
 union all
 select "element-p" as element , 0 as bucketa, 0 as bucketb, 1 as bucketc, 0 as bucketd, 0 as buckete
 union all
 select "element-q" as element , 1 as bucketa, 0 as bucketb, 0 as bucketc, 1 as bucketd, 0 as buckete
 union all
 select "element-r" as element , 0 as bucketa, 1 as bucketb, 0 as bucketc, 1 as bucketd, 1 as buckete

推荐答案

考虑以下解决方案 - 显然您需要确保在 matrix CTE 中提供准确的数据,并且您还需要分别调整 buckets_elements CTE 来反映矩阵中的所有桶.CTE 的其余部分和最终查询将为您提供帮助!

Consider below solution - obviously you need to make sure you provide accurate data in matrix CTE and also you need respectively adjust buckets_elements CTE to reflect all buckets in matrix. The rest of CTE's and final query will make a work for you!

with matrix as (
  select "element-x" as element, 1 as bucketa, 1 as bucketb, 1 as bucketc, 0 as bucketd, 0 as buckete union all
  select "element-y", 0, 0, 1, 0, 0 union all
  select "element-z", 1, 0, 1, 0, 0 union all
  select "element-p", 0, 0, 1, 0, 0 union all
  select "element-q", 1, 0, 0, 1, 0 union all
  select "element-r", 0, 1, 0, 1, 1    
), buckets_elements as ( 
  select array[struct(a), struct(b), struct(c), struct(d), struct(e)] buckets
  from (
    select 
      array_agg(if(bucketa = 1, element, null) ignore nulls) a,
      array_agg(if(bucketb = 1, element, null) ignore nulls) b,
      array_agg(if(bucketc = 1, element, null) ignore nulls) c,
      array_agg(if(bucketd = 1, element, null) ignore nulls) d,
      array_agg(if(buckete = 1, element, null) ignore nulls) e
    from matrix
  )
), columns_names as (
  select 
    regexp_extract_all(to_json_string((select as struct * except(element) from unnest([t]))), r'"([^"]+)"') cols
  from matrix t limit 1
), columns_index as (
  select generate_array(0, array_length(cols) - 1) as arr  
  from columns_names
), buckets_combinations as (
  select  
    (select array_agg(
      case when n & (1<<pos) <> 0 then arr[offset(pos)] end 
      ignore nulls)
     from unnest(generate_array(0, array_length(arr) - 1)) pos
    ) as combo
  from columns_index cross join 
  unnest(generate_array(1, cast(power(2, array_length(arr)) - 1 as int64))) n
)
select 
  array(select cols[offset(i)] from columns_names, unnest(combo) i) winners
from (
  select combo, 
    rank() over(order by (select count(distinct  el) from unnest(val) v, unnest(v.a) el) desc, array_length(combo)) as rnk
  from (
    select any_value(c).combo, array_agg(buckets[offset(i)]) val
    from buckets_combinations c, unnest(combo) i, buckets_elements b
    group by format('%t', c)
  )
)
where rnk = 1

带输出

这篇关于在 bigquery 中装袋元素的桶数最少的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆