Python 语法错误:非 ASCII [英] Python Syntax error: non-ASCII

查看:51
本文介绍了Python 语法错误:非 ASCII的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不断收到错误消息,但不知道如何解决.

I keep getting an error and I'm not sure on how to fix it.

代码行:

if not len(lines) or lines[-1] == '' or lines[-1] == '▁':
    lines = list(filter(lambda line: False if line == '' or line == '▁' else True, list(lines)))

输出:语法错误:第 512 行文件 prepare_data.py 中的非 ASCII 字符\xe2",但未声明编码;参见 http://python.org/dev/peps/pep-0263/详情

Output: SyntaxError: Non-ASCII character '\xe2' in file prepare_data.py on line 512, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

推荐答案

错误消息会准确地告诉您出了什么问题.Python 解释器需要知道显示为时髦下划线的字符串中字节的编码.

The error message tells you exactly what's wrong. The Python interpreter needs to know the encoding of the bytes in the string which displays as a funky underscore.

如果你想匹配 U+2581 然后你可以说

If you want to match U+2581 then you can say

.... or lines[-1] == '\u2581':

通过 Unicode 转义序列以纯 ASCII 表示此字符.如果你想匹配一个普通的 ASCII 下划线,那就是 ASCII 95/U+005F;这是并排的两个字符,以便于比较和可能的复制/粘贴:

which represents this character in pure ASCII by way of a Unicode escape sequence. If you want to match a regular ASCII underscore, that's ASCII 95 / U+005F; here are the two characters side by side for easy comparison and possible copy/paste:

U+2581 ▁  _ U+005F

错误消息中链接的 PEP 指示您确切如何告诉 Python此文件不是纯 ASCII;这是我正在使用的编码".如果编码是UTF-8,那就是

The linked PEP in the error message instructs you exactly how to tell Python "this file is not pure ASCII; here's the encoding I'm using". If the encoding is UTF-8, that would be

# coding=utf-8

或与 Emacs 兼容的

or the Emacs-compatible

# -*- encoding: utf-8 -*-

如果您不知道您的编辑器使用哪种编码来保存此文件,请使用十六进制编辑器和谷歌搜索之类的工具进行检查.Stack Overflow 标签有一个标签信息页面,其中包含更多信息和一些故障排除技巧.

If you don't know which encoding your editor uses to save this file, examine it with something like a hex editor and some googling. The Stack Overflow character-encoding tag has a tag info page with more information and some troubleshooting tips.

总之,在 7 位 ASCII 范围 (0x00-0x7F) 之外,Python 不能也不能猜测字节序列代表什么字符串.https://tripleee.github.io/8bit#e2 显示了 21 种可能的字节解释0xE2,这仅来自传统的 8 位编码;但它也很可能是多字节编码的第一个字节.事实上,我猜你实际上是在使用 UTF-8,它将这个字符表示为三个字节 0xE2 0x96 0x81;但是如果没有看到字符被渲染为类似于下划线的东西,人类也绝对无法猜测这一点.

In so many words, outside of the 7-bit ASCII range (0x00-0x7F), Python can't and mustn't guess what string a sequence of bytes represents. https://tripleee.github.io/8bit#e2 shows 21 possible interpretations for the byte 0xE2 and that's only from the legacy 8-bit encodings; but it could also very well be the first byte of a multi-byte encoding. In fact, I would guess you are actually using UTF-8, which represents this character as the three bytes 0xE2 0x96 0x81; but without also seeing the character rendered as something resembling an underscore, there would be absolutely no way to guess this for a human, either.

这篇关于Python 语法错误:非 ASCII的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆