仅当该字符不成对时才拆分特定字符的正则表达式 [英] Regular Expression to split on specific character ONLY if that character is not in a pair

查看:38
本文介绍了仅当该字符不成对时才拆分特定字符的正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个线程中找到最快的字符串替换算法后,我一直尝试修改其中之一以满足我的需求,特别是这个作者:gnibbler.

我会在这里再次解释这个问题,以及我遇到了什么问题.

假设我有一个看起来像这样的字符串:

str = "The &yquick &cbrown &bfox &Yjumps over the &ulazy dog"

您会注意到字符串中有很多位置有一个&符号,后跟一个字符(例如&y"和&c").我需要用字典中的适当值替换这些字符,如下所示:

dict = {"y":"\033[0;30m","c":"\033[0;31m","b":"\033[0;32m","Y":"\033[0;33m","u":"\033[0;34m"}

使用我之前线程中提供的 gnibblers 解决方案,我将其作为当前的解决方案:

myparts = tmp.split('&')myparts[1:]=[dict.get(x[0],"&"+x[0])+x[1:] for x in myparts[1:]]结果 = "".join(myparts)

这适用于正确替换字符,并且不会在未找到的字符上失败.唯一的问题是没有简单的方法实际上在输出中保留一个&符号.我能想到的最简单的方法是更改​​我的字典以包含:

dict = {"y":"\033[0;30m","c":"\033[0;31m","b":"\033[0;32m","Y":"\033[0;33m","u":"\033[0;34m","&":"&"}

并更改我的拆分"调用,以对后面没有其他与号的与号进行正则表达式拆分.

<预><代码>>>>进口重新>>>tmp = "&yI &creally &blove A && W &uRootbeer.">>>tmp.split('&')['', 'yI', 'creally', 'blove A', '', 'W', 'uRootbeer.']>>>re.split('MyRegex', tmp)['', 'yI', 'creally', 'blove A', '&W', 'uRootbeer.']

基本上,我需要一个正则表达式,它会在一对的第一个&符号和每个单个&符号上分开,以便我可以通过字典将其转义.

如果有人有更好的解决方案,请随时告诉我.

解决方案

您可以使用否定回溯(假设所讨论的正则表达式引擎支持它)来仅匹配不跟在另一个与号后面的与号.

/(?

After finding the fastest string replace algorithm in this thread, I've been trying to modify one of them to suit my needs, particularly this one by gnibbler.

I will explain the problem again here, and what issue I am having.

Say I have a string that looks like this:

str = "The &yquick &cbrown &bfox &Yjumps over the &ulazy dog"

You'll notice a lot of locations in the string where there is an ampersand, followed by a character (such as "&y" and "&c"). I need to replace these characters with an appropriate value that I have in a dictionary, like so:

dict = {"y":"\033[0;30m",
        "c":"\033[0;31m",
        "b":"\033[0;32m",
        "Y":"\033[0;33m",
        "u":"\033[0;34m"}

Using gnibblers solution provided in my previous thread, I have this as my current solution:

myparts = tmp.split('&')
myparts[1:]=[dict.get(x[0],"&"+x[0])+x[1:] for x in myparts[1:]]
result = "".join(myparts)

This works for replacing the characters properly, and does not fail on characters that are not found. The only problem with this is that there is no simple way to actually keep an ampersand in the output. The easiest way I could think of would be to change my dictionary to contain:

dict = {"y":"\033[0;30m",
        "c":"\033[0;31m",
        "b":"\033[0;32m",
        "Y":"\033[0;33m",
        "u":"\033[0;34m",
        "&":"&"}

And change my "split" call to do a regex split on ampersands that are NOT followed by other ampersands.

>>> import re
>>> tmp = "&yI &creally &blove A && W &uRootbeer."
>>> tmp.split('&')
['', 'yI ', 'creally ', 'blove A ', '', ' W ', 'uRootbeer.']
>>> re.split('MyRegex', tmp)
['', 'yI ', 'creally ', 'blove A ', '&W ', 'uRootbeer.']

Basically, I need a Regex that will split on the first ampersand of a pair, and every single ampersand, to allow me to escape it via my dictionary.

If anyone has any better solutions please feel free to let me know.

解决方案

You could use a negative lookbehind (assuming the regex engine in question supports it) to only match ampersands that do not follow another ampersand.

/(?<!&)&/

这篇关于仅当该字符不成对时才拆分特定字符的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆