如何部分过滤带有计数的子集字符串? [英] How to partially filter subset string with count?

查看:24
本文介绍了如何部分过滤带有计数的子集字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从字符串中过滤子字符串.我实现了

I am trying to filter substring from a string. I achieve it like

WITH `project.dataset.table` AS (
  SELECT 'anderstand' str UNION ALL
  SELECT 'anderstan' UNION ALL
  SELECT 'andersta' UNION ALL
  SELECT 'anderst' UNION ALL
  SELECT 'understand' str UNION ALL
  SELECT 'understan' UNION ALL
  SELECT 'understa' UNION ALL
  SELECT 'underst' UNION ALL
  SELECT 'unders' UNION ALL
  SELECT 'under' UNION ALL
  SELECT 'understand i' UNION ALL
  SELECT 'understand i' UNION ALL
  SELECT 'understand it' UNION ALL
  SELECT 'understand it' UNION ALL
  SELECT 'understand it y' UNION ALL
  SELECT 'understand it ye' UNION ALL
  SELECT 'understand it yes' UNION ALL
  SELECT 'understand it yes it' UNION ALL
  SELECT 'understand it yes it'
)

AND

#standardSQL
SELECT str FROM (
  SELECT str, STARTS_WITH(prev_str, str) AND  
    ARRAY_LENGTH(REGEXP_EXTRACT_ALL(str, r' ')) = ARRAY_LENGTH(REGEXP_EXTRACT_ALL(prev_str, r' ')) AS flag
  FROM (
    SELECT str, LAG(str) OVER(ORDER BY str DESC) AS prev_str
    FROM `project.dataset.table`
  )
)
WHERE NOT IFNULL(flag, FALSE) 

仅返回

Row str  
1   understand it yes it     
2   understand it yes    
3   understand it    
4   understand   
5   anderstand  

预期结果是

Row str                   count
1   understand it yes it   2
2   anderstand             1
3   understand it yes      1
4   understand             1
5   understand it          2

推荐答案

以下是BigQuery标准SQL

Below is for BigQuery Standard SQL

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'anderstand' str UNION ALL
  SELECT 'anderstan' UNION ALL
  SELECT 'andersta' UNION ALL
  SELECT 'anderst' UNION ALL
  SELECT 'understand' UNION ALL
  SELECT 'understan' UNION ALL
  SELECT 'understa' UNION ALL
  SELECT 'underst' UNION ALL
  SELECT 'unders' UNION ALL
  SELECT 'under' UNION ALL
  SELECT 'understand i' UNION ALL
  SELECT 'understand i' UNION ALL
  SELECT 'understand it' UNION ALL
  SELECT 'understand it' UNION ALL
  SELECT 'understand it y' UNION ALL
  SELECT 'understand it ye' UNION ALL
  SELECT 'understand it yes' UNION ALL
  SELECT 'understand it yes it' UNION ALL
  SELECT 'understand it yes it'
), temp AS (
  SELECT str, COUNT(1) `count`
  FROM `project.dataset.table`
  GROUP BY str
)
SELECT str , `count` FROM (
  SELECT str, `count`, STARTS_WITH(prev_str, str) AND  
    ARRAY_LENGTH(REGEXP_EXTRACT_ALL(str, r' ')) = ARRAY_LENGTH(REGEXP_EXTRACT_ALL(prev_str, r' ')) AS flag
  FROM (
    SELECT str, `count`, LAG(str) OVER(ORDER BY str DESC) AS prev_str
    FROM temp
  )
)
WHERE NOT IFNULL(flag, FALSE) 

有输出

Row str                     count    
1   understand it yes it    2    
2   understand it yes       1    
3   understand it           2    
4   understand              1    
5   anderstand              1    

要使用上述方法-您只需要在下面的查询中运行,将 project.dataset.table 替换为对表的引用-例如 yourproject.yourdataset.yourtable

To use above approach - you need just run below query with project.dataset.table replaced with reference to your table - like yourproject.yourdataset.yourtable

#standardSQL
WITH temp AS (
  SELECT str, COUNT(1) `count`
  FROM `project.dataset.table`
  GROUP BY str
)
SELECT str , `count` FROM (
  SELECT str, `count`, STARTS_WITH(prev_str, str) AND  
    ARRAY_LENGTH(REGEXP_EXTRACT_ALL(str, r' ')) = ARRAY_LENGTH(REGEXP_EXTRACT_ALL(prev_str, r' ')) AS flag
  FROM (
    SELECT str, `count`, LAG(str) OVER(ORDER BY str DESC) AS prev_str
    FROM temp
  )
)
WHERE NOT IFNULL(flag, FALSE) 

这篇关于如何部分过滤带有计数的子集字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆