错误:该位置无可重复 [英] Error: nothing to repeat at position

查看:63
本文介绍了错误:该位置无可重复的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文本文件,其中包含a语单词列表及其真实英语替代词.我使用:"作为分割点将此文本文件转换为字典,在转换后打印字典后,一切似乎都很好.

I have a text file which contains a list of slang words and their substitutes in real English. I converted this text file into a dictionary using ":" as a split point, and upon printing the dictionary after the conversion everything seems okay.

但是,源自此行的错误:slangs_re = re.compile('|'.join(slang_dict.keys()))表示nothing to repeat at position 112207.

However, an error originating from this line: slangs_re = re.compile('|'.join(slang_dict.keys())) says nothing to repeat at position 112207.

在尝试调试时,我发现错误某种程度上与字典相关.这是因为当我在下面运行代码时,没有得到正确的输出,但是也没有得到错误.此代码的预期输出为"fitness",但实际输出为"fitess".

While trying to debug, I found that the error is somehow linked to the dictionary. This is because when I ran the code right below, I didn't get a correct output, but I didn't get an error as well. The expected output for this code is "fitness" but the actual output is "fitess".

import re

test = "fitess"

slang_dict = {"fitess":"fitness", "damm":"damn"}

slangs_re = re.compile('|'.join(slang_dict.keys()))

def correct_slang(s, slang_dict=slang_dict):
    def replace(match):
        return slang_dict[match.group(0)]

    return slangs_re.sub(replace, s)

test = correct_slang(test)
print(test)

这是字典中的代码(很抱歉,但是文本文件太大,无法包含.示例可用

And this is the code with the dictionary (sorry, but the text file is too big to be included. A sample is available here). The expected output is "fitness" but the actual output is an error:

import re

test = "fitess"

file = open("slang_conversion.txt","r")

slang_dict = {}

for line in file:
    x = line.split(":")
    a = x[0]
    b = x[1]
    c = len(b) - 1

    b = b[0:c]

    slang_dict[a] = b

slangs_re = re.compile('|'.join(slang_dict.keys())) # <-- error

def correct_slang(s, slang_dict=slang_dict):
    def replace(match):
        return slang_dict[match.group(0)]

    return slangs_re.sub(replace, s)

test = correct_slang(test)

print(test)

在阅读其他SO线程时,我知道在某些情况下这是一个错误,但在这种情况下似乎不是一个错误.

Upon reading other SO threads, I came to know that it's a bug in some cases, but it doesn't seem to be one in this case.

谢谢

推荐答案

我建议替换

slangs_re = re.compile('|'.join(slang_dict.keys()))

使用

slangs_re = re.compile(r"(?<!\w)(?:{})(?!\w)".format('|'.join([re.escape(x) for x in slang_dict])))

并确保您按递减的顺序传递按长度排序的键.

and make sure you pass the keys sorted by length in the descending order.

from collections import OrderedDict
import re

test = "fitess no kome*"

slang_dict = {"Aha aha":"no", "fitess":"fitness", "damm":"damn", "kome*":"come", "ow wow":"rrf"}
slang_dict = OrderedDict(sorted(slang_dict.iteritems(), key=lambda x: len(x[0]), reverse=True))

slangs_re = re.compile(r"(?<!\w)(?:{})(?!\w)".format('|'.join([re.escape(x) for x in slang_dict])))
def correct_slang(s, slang_dict=slang_dict):
    def replace(match):
        return slang_dict[match.group(0)]

    return slangs_re.sub(replace, s)

test = correct_slang(test)
print(test)

请参见 Python演示

这将检查整个单词,并将每个搜索短语中的特殊字符转义,以便将它们传递给正则表达式引擎时不会发生任何问题.

This will check the terms as whole words and will escape the special chars in each of the search phrases so that no issues could occur when passing them to the regular expression engine.

如果您对整个单词匹配不感兴趣,请删除(?<!\w)(检查前导单词边界)和(?!\w)(检查后缀单词边界).

If you are not interested in whole word matching, remove (?<!\w) (checking for the leading word boundary) and (?!\w) (checking for the trailing word boundary).

这篇关于错误:该位置无可重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆