大查询SQL-计算字符串中的元素 [英] Big query SQL - Count elements from a string
问题描述
我有一个示例字符串数据,如下所示.
I have an example string data as below.
+----------------------------+
| Items |
|----------------------------|
| A >> B >> C |
| A >> A >> B |
| B >> B >> C >> C >> A |
| B >> B >> B >> C >> A >> D |
+----------------------------+
选择项目"作为示例,在第一行中,该项目"包含三个元素,即A,B和C.我想按顺序计算每个元素的数量.此外,我将只关注3个元素,因此我将忽略最后一个项"中的元素"D".
Take the "Items" in the first row as an example, this "Items" contains three elements, namely A, B and C. I want to calculate the number of each element in order. Besides, I will only focus on 3 elements so I will ignore element 'D' in the last "Items".
这就是我想要的:
+----------------------------+---------+----------+---------+----------+---------+----------+
| Items | Item1 | I1_Count | Item2 | I2_Count | Item3 | I3_Count |
|----------------------------|---------|----------|---------|----------|---------|----------|
| A >> B >> C | A | 1 | B | 1 | C | 1 |
| A >> A >> B | A | 2 | B | 1 | NULL | NULL |
| A >> B >> B >> C >> C | A | 1 | B | 1 | C | 2 |
| B >> B >> C >> C >> A | B | 2 | C | 2 | A | 1 |
| B >> B >> B >> C >> A >> D | B | 3 | C | 1 | A | 1 |
+----------------------------+---------+----------+---------+----------+---------+----------+
如何在SQL(我使用Google Bigquery)中实现它?
How can I implement it in SQL(I use Google Bigquery)?
感谢一百万!
推荐答案
您可以取消嵌套字符串,然后应用一些聚合逻辑:
You can unnest the string and then apply some aggregation logic:
with t as (
select 'A >> B >> C' as items union all
select 'A >> A >> B' as items union all
select 'B >> B >> C >> C >> A' as items union all
select 'B >> B >> B >> C >> A >> D' as items
)
select t.*,
(select as struct max(case when n = 1 then item end) as item_1,
max(case when n = 1 then cnt end) as item_1_cnt,
max(case when n = 2 then item end) as item_2,
max(case when n = 2 then cnt end) as item_2_cnt,
max(case when n = 3 then item end) as item_3,
max(case when n = 3 then cnt end) as item_3_cnt
from (select item, dense_rank() over (order by min(n)) as n, count(*) as cnt
from unnest(split(t.items, ' >> ')) item with offset n
group by item
) x
).*
from t;
最里面的子查询将字符串转换为带有计数器的行.然后将其汇总起来,以便在商品首次出现时对其进行排序-以及商品的数量.
The innermost subquery transforms the string into rows, with a counter. This is then aggregated to order the items by the first time they appear -- along with the count of the item.
最后,它们汇总到所需的不同列中.
Finally, these are aggregated into the different columns that you want.
这篇关于大查询SQL-计算字符串中的元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!