python正则表达式错误:不平衡的括号 [英] python regex error: unbalanced parenthesis

查看:104
本文介绍了python正则表达式错误:不平衡的括号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对 python 很陌生,所以我有一个字典,里面有一些键和一个字符串.如果在字典中找到的模式存在于字符串中,我必须替换该字符串.字典和字符串都非常大.我正在使用正则表达式来查找模式.

I pretty new to python, so i have a dictionary with some keys in it, and a string. I have to replace the string if a pattern found in the dictionary exists in the string. both the dictionary and string are very large. I'm using a regex to find the patterns.

一切正常,直到像这样的键弹出 '-(' 或这个 '(-)' 在这种情况下,python 会为不平衡的括号给出错误.

It all works fine until a key like this pops up '-(' or this '(-)' in which case python gives an error for unbalanced parenthesis.

这是我编写的代码的样子:

Here's how the code I've written looks:

somedict={'-(':'value1','(-)':'value2'}
somedata='this is some data containing -( and (-)'
for key in somedict.iterkeys():
    somedata=re.sub(key, 'newvalue', somedata)

这是我在控制台中遇到的错误

Here's the error I've got in the console

Traceback (most recent call last):
  File "<console>", line 2, in <module>
  File "C:\Python27\lib\re.py", line 151, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "C:\Python27\lib\re.py", line 244, in _compile
    raise error, v # invalid expression
error: unbalanced parenthesis

我还使用正则表达式编译器尝试了多种方法并进行了大量搜索,但没有找到任何解决问题的方法.任何帮助表示赞赏.

I've also tried it many ways using the regex compiler and searched a lot but didn't find anything addressing the problem. Any help is appreciated.

推荐答案

您需要使用 re.escape():

You need to escape the key using re.escape():

somedata = re.sub(re.escape(key), 'newvalue', somedata)

否则内容将被解释为正则表达式.

otherwise the contents will be interpreted as a regular expression.

您在这里根本没有使用正则表达式,所以您不妨使用:

You are not using regular expressions at all here, so you may as well just use:

somedata = somedata.replace(key, 'newvalue')

如果您只想替换整个单词(因此在输入字符串的开头或结尾处,在它们周围使用空格或标点符号),您需要某种边界锚点,在那里点使用正则表达式是有意义的.如果你只有字母数字单词(加上下划线), \b 会起作用:

If you wanted to replace only whole words (so with whitespace or punctuation markes around them, at the start or end of the input string), you need to some kind of boundary anchors, at which point it makes sense to use regular expressions. If all you have are alphanumeric words (plus underscores), \b would work:

somedata = re.sub(r'\b{}\b'.format(re.escape(key)), 'newvalue', somedata)

这会将 \b 放在您要替换的字符串之前和之后,这样 foo baz bar 中的 baz 被更改,但是 foo bazbaz bar 不是.

This puts \b before and after the string you wanted to replace, so that baz in foo baz bar is changed, but foo bazbaz bar is not.

对于涉及非字母数字单词"的输入,您需要将空格或开始和空格或结束锚点与前瞻和后视匹配:

For input that involves non-alphanumeric 'words', you'd need to match whitespace-or-start and whitespace-or-end anchors with look-aheads and look-behinds:

somedata = re.sub(r'(?:^|(?<=\s)){}(?:$|(?=\s))'.format(re.escape(key)), 'newvalue', somedata)

这里的模式 (?:^|(?<=\s)) 使用 两个 锚点,字符串开头的锚点和后视断言,匹配字符串开头或紧靠左边的空格的位置.类似地,(?:$|(?=\s) 对另一端也做同样的事情,匹配字符串的结尾或后跟空格的位置.

Here the pattern (?:^|(?<=\s)) uses two anchors, the start-of-string anchor and a look-behind assertion, to match the places where there is either the start of the string or a space immediately to the left. Similarly (?:$|(?=\s) does the same for the other end, matching the end of the string or a position followed by a space.

这篇关于python正则表达式错误:不平衡的括号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆