BigQuery来查找子序列 [英] BigQuery to find the Subsequence

查看:114
本文介绍了BigQuery来查找子序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我的表是

  WITH`sample_project.sample_dataset.table` AS(
SELECT'user1'user ,2个序列,'T1'ts UNION ALL
SELECT'user1',2,'T2'UNION ALL
SELECT'user1',1,'T3'UNION ALL
SELECT'user1' ,1,'T4'UNION ALL
SELECT'user1',3,'T5'UNION ALL
SELECT'user1',2,'T6'UNION ALL
SELECT'user1',3 ,'T7'UNION ALL
SELECT'user1',3,'T8'

可以在序列列中找到可用的整数子序列,而无需使用STRING_AGG和REGEX或JOIN操作?这是为了提高查询效率。



子序列是String的一部分。例如,考虑Stringbanana,因为来自香蕉的anna的每个索引字符都严格增加,所以样本子序列是anna。子序列中的字符不必是连续的。



当按顺序排列时间戳(INCREASING)时,对于上述表格,我会将序列列的STRING_AGG设为22113233.在字符串22113233子序列 1 2 3 可用,而子序列 3 2 1 不可用 。给定一个子序列 213 ,如何说明这个子序列是否可用(在 22113233 中按时间戳排序) ?

解决方案


给定一个子序列213,我该怎么说这个子序列是否可用在22113233 ...

以下示例适用于BigQuery SQL





SELECT'22113233'sequence_list
),

#standardSQL
WITH` `subsequenses` AS(
SELECT'123'subsequence UNION ALL
SELECT'321'UNION ALL
SELECT'213'

SELECT sequence_list,subsequence,
REGEXP_CONTAINS(sequence_list,REGEXP_REPLACE(subsequence,'','。*'))available
FROM`sequences` l
CROSS JOIN`子序列s

结果如下

  sequence_list子序列可用
22113233 321 false
22113233 123 true
22113233 213 true

如果您正在寻找特定的子序列 - 这可以进一步简化为

  #standardSQL 
WITH`sequences` AS(
SELECT'22113233'sequence_list UNION ALL
SELECT'11223322'

SELECT sequence_list,
REGEXP_CONTAINS(sequence_list,REGEXP_REPLACE('213','','。*'))available
FROM`sequences`

结果为

sequence_list available
22113233 true
11223322 false


Assuming my table is

WITH `sample_project.sample_dataset.table` AS (
  SELECT 'user1' user, 2 sequence, 'T1' ts UNION ALL
  SELECT 'user1', 2, 'T2' UNION ALL
  SELECT 'user1', 1, 'T3' UNION ALL
  SELECT 'user1', 1, 'T4' UNION ALL
  SELECT 'user1', 3, 'T5' UNION ALL
  SELECT 'user1', 2, 'T6' UNION ALL
  SELECT 'user1', 3, 'T7' UNION ALL
  SELECT 'user1', 3, 'T8' 
)

Can I find Subsequence of Integers available in sequence column without using STRING_AGG and REGEX OR JOIN operations ? This is to make query more efficient.

A subsequence is a part of String. For example consider String "banana", A sample subsequence is "anna" as each index character of "anna" from banana is strictly increasing. Characters in a subsequence need not be contiguous.

Say for the above table when order by timestamp (INCREASING), I would get STRING_AGG for sequence column as 22113233. In the String 22113233 subsequence 1 2 3 is available whereas subsequence 3 2 1 is not available. Given a subsequence 213, How can I say if this subsequence is available or not (in 22113233 which sorted by timestamp) ?

解决方案

Given a subsequence 213, How can I say if this subsequence is available or not (in 22113233 ...

Below example is for BigQuery SQL

#standardSQL
WITH `sequences` AS (
  SELECT '22113233' sequence_list 
), `subsequenses` AS (
  SELECT '123' subsequence UNION ALL
  SELECT '321' UNION ALL
  SELECT '213'
)
SELECT sequence_list, subsequence, 
  REGEXP_CONTAINS(sequence_list, REGEXP_REPLACE(subsequence, '', '.*')) available
FROM `sequences` l
CROSS JOIN `subsequenses` s   

with result as below

sequence_list   subsequence     available    
22113233        321             false    
22113233        123             true     
22113233        213             true     

if you are looking for specific subsequence - this can be further simplified as

#standardSQL
WITH `sequences` AS (
  SELECT '22113233' sequence_list UNION ALL
  SELECT '11223322'
)
SELECT sequence_list,  
  REGEXP_CONTAINS(sequence_list, REGEXP_REPLACE('213', '', '.*')) available
FROM `sequences`

with result as

sequence_list   available    
22113233        true     
11223322        false    

这篇关于BigQuery来查找子序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆