大查询SQL-计算字符串中的元素 [英] Big query SQL - Count elements from a string

查看:47
本文介绍了大查询SQL-计算字符串中的元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个示例字符串数据,如下所示.

I have an example string data as below.

+----------------------------+
|          Items             |
|----------------------------|
| A >> B >> C                |
| A >> A >> B                |
| B >> B >> C >> C >> A      |
| B >> B >> B >> C >> A >> D |
+----------------------------+

选择项目"作为示例,在第一行中,该项目"包含三个元素,即A,B和C.我想按顺序计算每个元素的数量.此外,我将只关注3个元素,因此我将忽略最后一个项"中的元素"D".

Take the "Items" in the first row as an example, this "Items" contains three elements, namely A, B and C. I want to calculate the number of each element in order. Besides, I will only focus on 3 elements so I will ignore element 'D' in the last "Items".

这就是我想要的:

+----------------------------+---------+----------+---------+----------+---------+----------+
|          Items             |  Item1  | I1_Count |  Item2  | I2_Count |  Item3  | I3_Count |
|----------------------------|---------|----------|---------|----------|---------|----------|
| A >> B >> C                |    A    |     1    |    B    |     1    |    C    |     1    | 
| A >> A >> B                |    A    |     2    |    B    |     1    |   NULL  |    NULL  |
| A >> B >> B >> C >> C      |    A    |     1    |    B    |     1    |    C    |     2    |
| B >> B >> C >> C >> A      |    B    |     2    |    C    |     2    |    A    |     1    |
| B >> B >> B >> C >> A >> D |    B    |     3    |    C    |     1    |    A    |     1    |
+----------------------------+---------+----------+---------+----------+---------+----------+

如何在SQL(我使用Google Bigquery)中实现它?

How can I implement it in SQL(I use Google Bigquery)?

感谢一百万!

推荐答案

您可以取消嵌套字符串,然后应用一些聚合逻辑:

You can unnest the string and then apply some aggregation logic:

with t as (
      select 'A >> B >> C' as items union all
      select 'A >> A >> B' as items union all
      select 'B >> B >> C >> C >> A' as items union all
      select 'B >> B >> B >> C >> A >> D' as items 
     )
select t.*,
       (select as struct max(case when n = 1 then item end) as item_1,
               max(case when n = 1 then cnt end) as item_1_cnt,
               max(case when n = 2 then item end) as item_2,
               max(case when n = 2 then cnt end) as item_2_cnt,
               max(case when n = 3 then item end) as item_3,
               max(case when n = 3 then cnt end) as item_3_cnt
        from (select item, dense_rank() over (order by min(n)) as n, count(*) as cnt
              from unnest(split(t.items, ' >> ')) item with offset n
              group by item
             ) x
       ).*
from t;

最里面的子查询将字符串转换为带有计数器的行.然后将其汇总起来,以便在商品首次出现时对其进行排序-以及商品的数量.

The innermost subquery transforms the string into rows, with a counter. This is then aggregated to order the items by the first time they appear -- along with the count of the item.

最后,它们汇总到所需的不同列中.

Finally, these are aggregated into the different columns that you want.

这篇关于大查询SQL-计算字符串中的元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆