优化正则表达式以优化键=值对,以空格分隔 [英] optimizing regex to fine key=value pairs, space delimited
问题描述
在我的正则表达式中使用正则表达式的简短网址: http://bit.ly/1jbOFGd
shortend URL with my current regex in regexpal: http://bit.ly/1jbOFGd
我有一排键=值对,以空格分隔.有些值包含空格和标点符号,因此我要进行积极的前瞻性检查,以检查是否存在另一个键.
I have a line of key=value pairs, space delimited. Some values contain spaces and punctuation so I do a positive lookahead to check for the existence of another key.
我想标记键和值,然后将其转换为python中的dict.
I want to tokenize the key and value, which I later convert to a dict in python.
我的猜测是,可以通过摆脱.*来加快这一步?但是如何?在python中,我在4.3秒内转换了10,000行.我想通过使此正则表达式匹配更有效来使速度提高一倍或两倍.
My guess is that I can speed this up by getting rid of .*? but how? In python I convert 10,000 of these lines in 4.3 seconds. I'd like to double or triple that speed by making this regex match more efficient.
推荐答案
更新:
(?<=\s|\A)([^\s=]+)=(.*?)(?=(?:\s[^\s=]+=|$))
我认为这比您更有效(尽管它仍然使用.*?
作为值,但其前瞻性并不复杂,并且不使用惰性修饰符),但是我需要你测试.这和我的原始表达式相同,但是处理值的方式不同.它使用一个懒惰的.*?
匹配,后跟一个先行符,该先行符是 一个空格,然后是一个键,然后是=
或字符串的结尾.请注意,我始终将键定义为[^\s=]+
,因此键不能包含等号或空格(此特定键可帮助我们避免延迟匹配).
I would think this one is more efficient than yours (even though it still uses the .*?
for the value, its lookahead is no where near as complex and doesn't use a lazy modifier), but I'll need you to test. This does the same as my original expression, but handles values differently. It uses a lazy .*?
match followed by a lookahead that is either a space, followed by a key, followed by a =
OR the end of the string. Notice I always define a key as [^\s=]+
, so keys cannot contain an equal sign or whitespace (being this specific helps us avoid lazy matches).
原始:
做一些这个简单的操作,我是否缺少您需要的一些规则?
Are there some rules I am missing that you need by doing something this simple?
(?<=\s|\A)([^=]+)=([\S]+)
这从后面的空格字符(\s
)或字符串的开头(\A
)开始.然后我们匹配除=
以外的所有内容,后跟一个=
,并匹配除空白(\s
)之外的所有内容.
This starts with a lookbehind of either a space character (\s
) or the beginning of the string (\A
). Then we match everything except =
, followed by a =
, and match everything except whitespace (\s
).
这篇关于优化正则表达式以优化键=值对,以空格分隔的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!