python代码中的显式非法字符序列 [英] Explicitly illegal character sequence in python code

查看：40 发布时间：2021/6/26 20:01:55 python python-2.7

本文介绍了python代码中的显式非法字符序列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个 UTF-8 输入文件，它经常包含非法字符序列.由于它似乎只是那个特定的序列，我想在我的 Python 脚本中用它的适当等效项替换它.

I have an UTF-8 input file which regularly contains an illegal character sequence. Since it only appears to be that specific sequence, I want to replace it with its proper equivalent in my Python script.

这应该很简单，我想:

value = value.replace('\xE2\x80\x3f', u'"'.encode('utf8'))

但是，脚本没有运行 - 相反，它向我抛出了一个错误:

However, the script doesn't run - instead, it throws me an error:

SyntaxError:第 10 行文件 script.py 中的非 ASCII 字符\xe2"，但未声明编码；详情见http://www.python.org/peps/pep-0263.html

是否有一种编码允许我将任何字符编码为字符串文字，实质上是告诉 Python 闭嘴，让我使用我想要的任何无效字符?

Is there an encoding that allows me to encode any character into a string literal, essentially telling Python to shut up and let me use whatever invalid character I want?

(注意:我使用的是 Python 2.7)

(Note: I am using Python 2.7)

推荐答案

# -*- coding:utf-8 -*-

value = "What an amazing string \xE2\x80\x3f !!"

value = value.replace('\xE2\x80\x3f', u'"'.encode('utf8'))

print value

之所以有效，是因为 Python2 解释器将输入脚本文件读取为 ASCII 文件，并且不解码 UTF-8 字符.因为你在文件中写入了一个显式的 UTF-8 字符(即 ")，你需要告诉解释器他必须将输入脚本文件作为 UTF-8 文件读取，而不是作为一个ASCII 文件.

The reason this is working is because Python2 interpreter read the input script file as an ASCII file, and doesn't decode UTF-8 characters. Because you write an explicit UTF-8 character into the file (i.e. "), you need to tell the interpreter that he has to read the input script file as an UTF-8 file, and not as an ASCII file.

另请参阅关于源代码编码的 PEP0263

See also the PEP0263 about source code encodings

这篇关于python代码中的显式非法字符序列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

python代码中的显式非法字符序列 [英] Explicitly illegal character sequence in python code

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

python代码中的显式非法字符序列 [英] Explicitly illegal character sequence in python code

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭