错误:无法解析正则表达式“":模式太大 - 编译失败 [英] Error: Failed to parse regular expression "": pattern too large - compile failed

查看:198
本文介绍了错误:无法解析正则表达式“":模式太大 - 编译失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现以下现象:

我有一个使用REGEXP_EXTRACT函数提取100个字段的BQ查询。



我添加了一个新表达式并得到以下错误:无法解析正则表达式:模式太大 - 编译失败。



单独查询此表达式时,一切运行良好,在一个更大的查询中,我得到了错误。

这是基于github示例数据和简单正则表达式的问题的复本:

  SELECT repository.description,
REGEXP_EXTRACT(repository.description,r'(?:\w){0}作为Pos2的
REGEXP_EXTRACT(repository.description,r'(?:\ w){1}(\w)'),
REGEXP_EXTRACT(repository.description, r'(?:\ w){2}(\ w)')作为Pos3,

。这里它继续和以相同的模式

REGEXP_EXTRACT(repository.description,r'(?:\ w){198}(\w)')为Pos199,
REGEXP_EXTRACT(repository.description,r'(?:\w) )作为Pos200,
REGEXP_EXTRACT(repository.description,r'(?:\ w){200}(\w)')作为Pos201,
) FROM [publicdata:samples.github_nested] LIMIT 1000

返回:

 无法解析正则表达式(?:\w){162}(\w):模式太大 - 编译失败

但运行时:

  SELECT repository.description,
REGEXP_EXTRACT(repository.description,r'(?:\ w){162}(\w)')为Pos163,
FROM [publicdata:samples.github_nested ] LIMIT 1000

一切运行正常...



REGEXP_EXTRACT的#是否有限制,或者它们的组合复杂性可用于单个查询?

解决方案

我会研究这个问题。作为一种解决方法,它看起来像你试图做的是将字段拆分成每个字符位置的单独字段......所以将abc转换为{pos1:a,pos2:b,pos3: C}。那是对的吗?如果是这样,您可能需要尝试使用LEFT()和RIGHT()函数。如在

 > LEFT(1,reponsitory.description)as pos1,
RIGHT(1,LEFT(2,reponsitory .description))作为pos2,
RIGHT(1,LEFT(3,reponsitory.description))作为pos3。

与编译200个正则表达式相比,这应该使用更少的资源(尽管它仍然不可能很快) 。


I find the following phenomena:

I have a BQ query with 100s of fields extracted using REGEXP_EXTRACT function.

I added a new expression and got the following Error: Failed to parse regular expression "": pattern too large - compile failed.

When querying this expression alone, everything runs fine, in a larger query, i get the error.

This is a replica of the problem base on the github sample data and a simple regex:

    SELECT repository.description,
    REGEXP_EXTRACT(repository.description,r'(?:\w){0}(\w)') as Pos1,
    REGEXP_EXTRACT(repository.description,r'(?:\w){1}(\w)') as Pos2,
    REGEXP_EXTRACT(repository.description,r'(?:\w){2}(\w)') as Pos3,
.
. here it goes on and on in the same pattern
.
    REGEXP_EXTRACT(repository.description,r'(?:\w){198}(\w)') as Pos199,
    REGEXP_EXTRACT(repository.description,r'(?:\w){199}(\w)') as Pos200,
    REGEXP_EXTRACT(repository.description,r'(?:\w){200}(\w)') as Pos201,
    FROM [publicdata:samples.github_nested] LIMIT 1000

It returns:

Failed to parse regular expression "(?:\w){162}(\w)": pattern too large - compile failed

but when running:

SELECT repository.description,
REGEXP_EXTRACT(repository.description,r'(?:\w){162}(\w)') as Pos163,
FROM [publicdata:samples.github_nested] LIMIT 1000

Everything runs OK...

Is there a limit to # of REGEXP_EXTRACTs, or their combined complexity, that can be used in a single query?

解决方案

I'll look into the issue. As a workaround, it looks like what you're trying to do is to split out the field into separate fields per character position... so turn "abc" into {pos1: "a", pos2: "b", pos3: "c"}. Is that correct? If so, you might want to try the LEFT() and RIGHT() functions. As in

LEFT(1, reponsitory.description) as pos1,
RIGHT(1, LEFT(2, reponsitory.description)) as pos2,
RIGHT(1, LEFT(3, reponsitory.description)) as pos3. 

This should use fewer resources than compiling 200 regular expressions (although it is still not likely to be fast).

这篇关于错误:无法解析正则表达式“":模式太大 - 编译失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆