BigQuery - 正则表达式匹配已知字符串后的 8 位数字 [英] BigQuery - Regex to match number of 8 digits after a known string

查看:119
本文介绍了BigQuery - 正则表达式匹配已知字符串后的 8 位数字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在已知字符串后提取 8 位数字:

I need to extract 8 digits after a known string:

| MyString                     | Extract: | 
| ---------------------------- | -------- | 
| mypasswordis 12345678        | 12345678 | 
| # mypasswordis 12345678      | 12345678 | 
| foobar mypasswordis 12345678 | 12345678 |

我可以用正则表达式来做到这一点:

I can do this with regex like:

(?<=mypasswordis.*)[0-9]{8})

但是,当我想使用 REGEXP_EXTRACT 命令,我收到错误消息,无法解析正则表达式:perl 运算符无效:(?<".

However, when I want to do this in BigQuery using the REGEXP_EXTRACT command, I get the error message, "Cannot parse regular expression: invalid perl operator: (?<".

我搜索了 re2 库,发现似乎没有相当于正向后视.

I searched through the re2 library and saw there doesn't seem to be an equivalent for positive lookbehind.

有什么办法可以使用其他方法来做到这一点吗?类似的东西

Is there any way I can do this using other methods? Something like

SELECT REGEXP_EXTRACT(MyString, r"(?<=mypasswordis.*)[0-9]{8}"))

推荐答案

这里需要一个捕获组来提取模式的一部分,请参阅 REGEXP_EXTRACT 您链接到的文档:

You need a capturing group here to extract a part of a pattern, see the REGEXP_EXTRACT docs you linked to:

如果正则表达式包含捕获组,函数返回与该捕获组匹配的子字符串.如果表达式不包含捕获组,函数返回整个匹配的子字符串.

If the regular expression contains a capturing group, the function returns the substring that is matched by that capturing group. If the expression does not contain a capturing group, the function returns the entire matching substring.

另外,.* 模式成本太高,你只需要匹配单词和数字之间的空格.

Also, the .* pattern is too costly, you only need to match whitespace between the word and the digits.

使用

SELECT REGEXP_EXTRACT(MyString, r"mypasswordis\s*([0-9]{8})"))

或者只是

SELECT REGEXP_EXTRACT(MyString, r"mypasswordis\s*([0-9]+)"))

请参阅 re2 正则表达式在线测试.

这篇关于BigQuery - 正则表达式匹配已知字符串后的 8 位数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆