优化正则表达式以优化键=值对,以空格分隔 [英] optimizing regex to fine key=value pairs, space delimited

查看:82
本文介绍了优化正则表达式以优化键=值对,以空格分隔的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的正则表达式中使用正则表达式的简短网址: http://bit.ly/1jbOFGd

shortend URL with my current regex in regexpal: http://bit.ly/1jbOFGd

我有一排键=值对,以空格分隔.有些值包含空格和标点符号,因此我要进行积极的前瞻性检查,以检查是否存在另一个键.

I have a line of key=value pairs, space delimited. Some values contain spaces and punctuation so I do a positive lookahead to check for the existence of another key.

我想标记键和值,然后将其转换为python中的dict.

I want to tokenize the key and value, which I later convert to a dict in python.

我的猜测是,可以通过摆脱.*来加快这一步?但是如何?在python中,我在4.3秒内转换了10,000行.我想通过使此正则表达式匹配更有效来使速度提高一倍或两倍.

My guess is that I can speed this up by getting rid of .*? but how? In python I convert 10,000 of these lines in 4.3 seconds. I'd like to double or triple that speed by making this regex match more efficient.

推荐答案

更新:

(?<=\s|\A)([^\s=]+)=(.*?)(?=(?:\s[^\s=]+=|$))

我认为这比您更有效(尽管它仍然使用.*?作为值,但其前瞻性并不复杂,并且不使用惰性修饰符),但是我需要你测试.这和我的原始表达式相同,但是处理值的方式不同.它使用一个懒惰的.*?匹配,后跟一个先行符,该先行符是 一个空格,然后是一个键,然后是=或字符串的结尾.请注意,我始终将键定义为[^\s=]+,因此键不能包含等号或空格(此特定键可帮助我们避免延迟匹配).

I would think this one is more efficient than yours (even though it still uses the .*? for the value, its lookahead is no where near as complex and doesn't use a lazy modifier), but I'll need you to test. This does the same as my original expression, but handles values differently. It uses a lazy .*? match followed by a lookahead that is either a space, followed by a key, followed by a = OR the end of the string. Notice I always define a key as [^\s=]+, so keys cannot contain an equal sign or whitespace (being this specific helps us avoid lazy matches).

来源

原始:

做一些这个简单的操作,我是否缺少您需要的一些规则?

Are there some rules I am missing that you need by doing something this simple?

(?<=\s|\A)([^=]+)=([\S]+)

这从后面的空格字符(\s)或字符串的开头(\A)开始.然后我们匹配除=以外的所有内容,后跟一个=,并匹配除空白(\s)之外的所有内容.

This starts with a lookbehind of either a space character (\s) or the beginning of the string (\A). Then we match everything except =, followed by a =, and match everything except whitespace (\s).

这篇关于优化正则表达式以优化键=值对,以空格分隔的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆