Python 3.7.4:'re.error: 位置 0 处的错误转义 \s' [英] Python 3.7.4: 're.error: bad escape \s at position 0'

查看:261
本文介绍了Python 3.7.4:'re.error: 位置 0 处的错误转义 \s'的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的程序看起来像这样:

导入重新# 转义字符串,以防它碰巧有元字符my_str = "快速的棕色狐狸跳了起来"escaped_str = re.escape(my_str)# "The\\ quick\\ brown\\ fox\\ 跳了起来"# 用通用的空白模式替换转义的空格模式spaced_pa​​ttern = re.sub(r"\\\s+", r"\s+", escaped_str)# 引发错误

错误是这样的:

回溯(最近一次调用最后一次):文件<input>",第 1 行,在 <module> 中运行文件中的文件/home/swfarnsworth/programs/pycharm-2019.2/helpers/pydev/_pydev_bundle/pydev_umd.py",第 197 行pydev_imports.execfile(filename, global_vars, local_vars) # 执行脚本文件/home/swfarnsworth/programs/pycharm-2019.2/helpers/pydev/_pydev_imps/_pydev_execfile.py",第 18 行,在 execfile 中exec(compile(contents+"\n", file, 'exec'), glob, loc)文件/home/swfarnsworth/projects/medaCy/medacy/tools/converters/con_to_brat.py",第255行,在<module>内容 = convert_con_to_brat(full_file_path)文件/home/swfarnsworth/projects/my_file.py",第 191 行,在 convert_con_to_bratstart_ind = get_absolute_index(text_lines, d["start_ind"], d["data_item"])文件/home/swfarnsworth/projects/my_file.py",第 122 行,在 get_absolute_index 中entity_pattern_spaced = re.sub(r"\\\s+", r"\s+", entity_pattern_escaped)文件/usr/local/lib/python3.7/re.py",第 192 行,在子文件中返回 _compile(pattern, flags).sub(repl, string, count)文件/usr/local/lib/python3.7/re.py",第 309 行,在 _subx 中模板 = _compile_repl(模板,模式)_compile_repl 中的文件/usr/local/lib/python3.7/re.py",第 300 行返回 sre_parse.parse_template(repl, 模式)文件/usr/local/lib/python3.7/sre_parse.py",第 1024 行,在 parse_template 中raise s.error('bad escape %s' % this, len(this))re.error: 位置 0 处的错误转义 \s

即使我删除了 '\s+' 之前的两个反斜杠,或者我创建了原始字符串 (r"\\\s+"),我也会收到此错误变成一个普通的字符串.我检查了 Python 3.7 文档,看起来 \s 仍然是空格的转义序列.

解决方案

尝试摆弄反斜杠以避免正则表达式试图解释 \s:

spaced_pa​​ttern = re.sub(r"\\s+", "\\s+", escaped_str)

现在

<预><代码>>>>间隔模式'The\\s+quick\\s+brown\\s+fox\\s+jumped'>>>打印(spaced_pa​​ttern)\s+quick\s+brown\s+fox\s+jumped

但是为什么呢?

似乎 python 试图解释 \s 就像它解释 r"\n" 一样,而不是像 Python 通常那样不理会它.如果你这样做.例如:

re.sub(r"\\\s+", r"\n+", escaped_str)

产量:

该+快速+棕色+狐狸+跳了

即使在原始字符串中使用了 \n.

中引入了更改'\' 和正则表达式中的 ASCII 字母现在是错误.

执行替换的代码在 sre_parse.py (python 3.7) 中:

 其他:尝试:this = chr(ESCAPES[this][1])除了 KeyError:如果 ASCIILETTERS 中的 c:raise s.error('bad escape %s' % this, len(this))

此代码查找文字 \ 后面的内容,并尝试将其替换为正确的非 ascii 字符.显然 s 不在 ESCAPES 字典中,因此触发了 KeyError 异常,然后是您收到的消息.

在以前的版本中,它只是发出警告:

导入警告warnings.warn('错误逃生 %s' % this,弃用警告,堆栈级别 = 4)

看起来我们并不孤单从 3.6 升级到 3.7:https://github.com/gi0baro/weppy/issues/227

My program looks something like this:

import re
# Escape the string, in case it happens to have re metacharacters
my_str = "The quick brown fox jumped"
escaped_str = re.escape(my_str)
# "The\\ quick\\ brown\\ fox\\ jumped"
# Replace escaped space patterns with a generic white space pattern
spaced_pattern = re.sub(r"\\\s+", r"\s+", escaped_str)
# Raises error

The error is this:

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/home/swfarnsworth/programs/pycharm-2019.2/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/home/swfarnsworth/programs/pycharm-2019.2/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/home/swfarnsworth/projects/medaCy/medacy/tools/converters/con_to_brat.py", line 255, in <module>
    content = convert_con_to_brat(full_file_path)
  File "/home/swfarnsworth/projects/my_file.py", line 191, in convert_con_to_brat
    start_ind = get_absolute_index(text_lines, d["start_ind"], d["data_item"])
  File "/home/swfarnsworth/projects/my_file.py", line 122, in get_absolute_index
    entity_pattern_spaced = re.sub(r"\\\s+", r"\s+", entity_pattern_escaped)
  File "/usr/local/lib/python3.7/re.py", line 192, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "/usr/local/lib/python3.7/re.py", line 309, in _subx
    template = _compile_repl(template, pattern)
  File "/usr/local/lib/python3.7/re.py", line 300, in _compile_repl
    return sre_parse.parse_template(repl, pattern)
  File "/usr/local/lib/python3.7/sre_parse.py", line 1024, in parse_template
    raise s.error('bad escape %s' % this, len(this))
re.error: bad escape \s at position 0

I get this error even if I remove the two backslashes before the '\s+' or if I make the raw string (r"\\\s+") into a regular string. I checked the Python 3.7 documentation, and it appears that \s is still the escape sequence for white space.

解决方案

Try fiddling with the backslashes to avoid that regex tries to interpret \s:

spaced_pattern = re.sub(r"\\\s+", "\\\s+", escaped_str)

now

>>> spaced_pattern
'The\\s+quick\\s+brown\\s+fox\\s+jumped'
>>> print(spaced_pattern)
The\s+quick\s+brown\s+fox\s+jumped

But why?

It seems that python tries to interpret \s like it would interpret r"\n" instead of leaving it alone like Python normally does. If you do. For example:

re.sub(r"\\\s+", r"\n+", escaped_str)

yields:

The
+quick
+brown
+fox
+jumped

even if \n was used in a raw string.

The change was introduced in Issue #27030: Unknown escapes consisting of '\' and ASCII letter in regular expressions now are errors.

The code that does the replacement is in sre_parse.py (python 3.7):

        else:
            try:
                this = chr(ESCAPES[this][1])
            except KeyError:
                if c in ASCIILETTERS:
                    raise s.error('bad escape %s' % this, len(this))

This code looks for what's behind a literal \ and tries to replace it by the proper non-ascii character. Obviously s is not in ESCAPES dictionary so the KeyError exception is triggered, then the message you're getting.

On previous versions it just issued a warning:

import warnings
warnings.warn('bad escape %s' % this,
              DeprecationWarning, stacklevel=4)

Looks that we're not alone to suffer from 3.6 to 3.7 upgrade: https://github.com/gi0baro/weppy/issues/227

这篇关于Python 3.7.4:'re.error: 位置 0 处的错误转义 \s'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆