python代码中的显式非法字符序列 [英] Explicitly illegal character sequence in python code

查看:40
本文介绍了python代码中的显式非法字符序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 UTF-8 输入文件,它经常包含非法字符序列.由于它似乎只是那个特定的序列,我想在我的 Python 脚本中用它的适当等效项替换它.

I have an UTF-8 input file which regularly contains an illegal character sequence. Since it only appears to be that specific sequence, I want to replace it with its proper equivalent in my Python script.

这应该很简单,我想:

value = value.replace('\xE2\x80\x3f', u'"'.encode('utf8'))

但是,脚本没有运行 - 相反,它向我抛出了一个错误:

However, the script doesn't run - instead, it throws me an error:

SyntaxError:第 10 行文件 script.py 中的非 ASCII 字符\xe2",但未声明编码;详情见http://www.python.org/peps/pep-0263.html

是否有一种编码允许我将任何字符编码为字符串文字,实质上是告诉 Python 闭嘴,让我使用我想要的任何无效字符?

Is there an encoding that allows me to encode any character into a string literal, essentially telling Python to shut up and let me use whatever invalid character I want?

(注意:我使用的是 Python 2.7)

(Note: I am using Python 2.7)

推荐答案

# -*- coding:utf-8 -*-

value = "What an amazing string \xE2\x80\x3f !!"

value = value.replace('\xE2\x80\x3f', u'"'.encode('utf8'))

print value

之所以有效,是因为 Python2 解释器将输入脚本文件读取为 ASCII 文件,并且不解码 UTF-8 字符.因为你在文件中写入了一个显式的 UTF-8 字符(即 "),你需要告诉解释器他必须将输入脚本文件作为 UTF-8 文件读取,而不是作为一个ASCII 文件.

The reason this is working is because Python2 interpreter read the input script file as an ASCII file, and doesn't decode UTF-8 characters. Because you write an explicit UTF-8 character into the file (i.e. "), you need to tell the interpreter that he has to read the input script file as an UTF-8 file, and not as an ASCII file.

另请参阅关于源代码编码的 PEP0263

See also the PEP0263 about source code encodings

这篇关于python代码中的显式非法字符序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆