Python:ascii 编解码器不能编码短划线 [英] Python: ascii codec can't encode en-dash

查看:76
本文介绍了Python:ascii 编解码器不能编码短划线的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 热敏打印机

I'm trying to print a poem from the Poetry Foundation's daily poem RSS feed with a thermal printer that supports an encoding of CP437. This means I need to translate some characters; in this case an en-dash to a hyphen. But python won't even encode the en dash to begin with. When I try to decode the string and replace the en-dash with a hyphen I get the following error:

Traceback (most recent call last):
  File "pftest.py", line 46, in <module>
    str = str.decode('utf-8')
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 140: ordinal not in range(128)

这是我的代码:

#!/usr/bin/python
#-*- coding: utf-8 -*-

# This string is actually a variable entitled d['entries'][1].summary_detail.value
str = """Love brought by night a vision to my bed,
One that still wore the vesture of a child
But eighteen years of age – who sweetly smiled"""

str = str.decode('utf-8')
str = str.replace("\u2013", "-") #en dash
str = str.replace("\u2014", "--") #em dash
print (str)

我实际上可以使用以下代码在我的终端窗口 (Mac) 中打印输出而没有错误,但我的打印机会吐出 3 个 CP437 字符集:

I can actually print the output using the following code without errors in my terminal window (Mac), but my printer spits out sets of 3 CP437 characters:

str = u''.str.encode('utf-8')

我使用 Sublime Text 作为我的编辑器,并且我用 UTF-8 编码保存了页面,但我不确定这会有所帮助.我将不胜感激任何有关此代码的帮助.谢谢!

I'm using Sublime Text as my editor, and I've saved the page with UTF-8 encoding, but I'm not sure that will help things. I would greatly appreciate any help with this code. Thank you!

推荐答案

我不完全理解您的代码中发生了什么,但我也一直在尝试用我从网络,这就是对我有用的东西.我的代码是这样的:

I don't fully understand what's happening in your code, but I've also been trying to replace en-dashes with hyphens in a string I got from the Web, and here's what's working for me. My code is just this:

txt = re.sub(u"\u2013", "-", txt)

我正在使用 Python 2.7 和 Sublime Text 2,但我不想在我的脚本中设置 -*- coding: utf-8 -*-,因为我试图不这样做引入任何新的编码问题.(即使我的变量可能包含 Unicode,但我还是希望我的代码保持纯 ASCII.)您是否需要在 .py 文件中包含 Unicode,还是只是为了帮助调试?

I'm using Python 2.7 and Sublime Text 2, but I don't bother setting -*- coding: utf-8 -*- in my script, as I'm trying not to introduce any new encoding issues. (Even though my variables may contain Unicode I like to keep my code pure ASCII.) Do you need to include Unicode in your .py file, or was that just to help with debugging?

我会注意到我的 txt 变量已经是一个 unicode 字符串,即

I'll note that my txt variable is already a unicode string, i.e.

print type(txt)

生产

<type 'unicode'>

我很想知道 type(str) 在你的情况下会产生什么.

I'd be curious to know what type(str) would produce in your case.

我在你的代码中注意到的一件事是

One thing I noticed in your code is

str = str.replace("\u2013", "-") #en dash

你确定这有什么用吗?我的理解是 \u 仅表示 u"" 字符串中的unicode 字符",而您在那里创建的是一个包含 5 个字符的字符串,一个u"、"2"、"0" 等(第一个字符是因为您可以转义任何字符,如果没有特殊含义,例如在 '\n' 或 '\t' 的情况下,它只会忽略反斜杠.)

Are you sure that does anything? My understanding is that \u only means "unicode character' inside a u"" string, and what you've created there is a string with 5 characters, a "u", a "2", a "0", etc. (The first character is because you can escape any character and if there's no special meaning, like in the case of '\n' or '\t', it just ignores the backslash.)

此外,您从打印机获得 3 个 CP437 字符这一事实让我怀疑您的字符串中仍然有破折号.短划线的 UTF-8 编码是 3 个字节:0xe2 0x80 0x93.当您在包含短划线的 unicode 字符串上调用 str.encode('utf-8') 时,您将在返回的字符串中获得这三个字节.我猜你的终端知道如何将它解释为一个破折号,这就是你所看到的.

Also, the fact that you get 3 CP437 characters from your printer makes me suspect that you still have an en-dash in your string. The UTF-8 encoding of an en-dash is 3 bytes: 0xe2 0x80 0x93. When you call str.encode('utf-8') on a unicode string that contains an en-dash you get those three bytes in the returned string. I'm guessing that your terminal knows how to interpret that as an en-dash and that's what you're seeing.

如果你不能让我的第一种方法起作用,我会提到我也成功了:

If you can't get my first method to work, I'll mention that I also had success with this:

txt = txt.encode('utf-8')
txt = re.sub("\xe2\x80\x93", "-", txt)

也许 re.sub() 对你有用,如果你把它放在你调用 encode() 之后.在这种情况下,您甚至可能根本不需要对 decode() 的调用.我承认我真的不明白它为什么会在那里.

Maybe that re.sub() would work for you if you put it after your call to encode(). And in that case you might not even need that call to decode() at all. I'll confess that I really don't understand why it's there.

这篇关于Python:ascii 编解码器不能编码短划线的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆