BigQuery - 正则表达式匹配已知字符串后的 8 位数字 [英] BigQuery - Regex to match number of 8 digits after a known string
问题描述
我需要在已知字符串后提取 8 位数字:
I need to extract 8 digits after a known string:
| MyString | Extract: |
| ---------------------------- | -------- |
| mypasswordis 12345678 | 12345678 |
| # mypasswordis 12345678 | 12345678 |
| foobar mypasswordis 12345678 | 12345678 |
我可以用正则表达式来做到这一点:
I can do this with regex like:
(?<=mypasswordis.*)[0-9]{8})
但是,当我想使用 REGEXP_EXTRACT 命令,我收到错误消息,无法解析正则表达式:perl 运算符无效:(?<".
However, when I want to do this in BigQuery using the REGEXP_EXTRACT command, I get the error message, "Cannot parse regular expression: invalid perl operator: (?<".
我搜索了 re2 库,发现似乎没有相当于正向后视.
I searched through the re2 library and saw there doesn't seem to be an equivalent for positive lookbehind.
有什么办法可以使用其他方法来做到这一点吗?类似的东西
Is there any way I can do this using other methods? Something like
SELECT REGEXP_EXTRACT(MyString, r"(?<=mypasswordis.*)[0-9]{8}"))
推荐答案
这里需要一个捕获组来提取模式的一部分,请参阅 REGEXP_EXTRACT
您链接到的文档:
You need a capturing group here to extract a part of a pattern, see the REGEXP_EXTRACT
docs you linked to:
如果正则表达式包含捕获组,函数返回与该捕获组匹配的子字符串.如果表达式不包含捕获组,函数返回整个匹配的子字符串.
If the regular expression contains a capturing group, the function returns the substring that is matched by that capturing group. If the expression does not contain a capturing group, the function returns the entire matching substring.
另外,.*
模式成本太高,你只需要匹配单词和数字之间的空格.
Also, the .*
pattern is too costly, you only need to match whitespace between the word and the digits.
使用
SELECT REGEXP_EXTRACT(MyString, r"mypasswordis\s*([0-9]{8})"))
或者只是
SELECT REGEXP_EXTRACT(MyString, r"mypasswordis\s*([0-9]+)"))
请参阅 re2 正则表达式在线测试.
这篇关于BigQuery - 正则表达式匹配已知字符串后的 8 位数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!