Python:ASCII编解码器无法对破折号进行编码 [英] Python: ascii codec can't encode en-dash

查看:235
本文介绍了Python:ASCII编解码器无法对破折号进行编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正尝试使用热敏打印机支持CP437的编码.这意味着我需要翻译一些字符;在这种情况下,短划线为连字符.但是python甚至不会对破折号进行编码.当我尝试解码字符串并将连字符替换为连字符时,出现以下错误:

I'm trying to print a poem from the Poetry Foundation's daily poem RSS feed with a thermal printer that supports an encoding of CP437. This means I need to translate some characters; in this case an en-dash to a hyphen. But python won't even encode the en dash to begin with. When I try to decode the string and replace the en-dash with a hyphen I get the following error:

Traceback (most recent call last):
  File "pftest.py", line 46, in <module>
    str = str.decode('utf-8')
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 140: ordinal not in range(128)

这是我的代码:

#!/usr/bin/python
#-*- coding: utf-8 -*-

# This string is actually a variable entitled d['entries'][1].summary_detail.value
str = """Love brought by night a vision to my bed,
One that still wore the vesture of a child
But eighteen years of age – who sweetly smiled"""

str = str.decode('utf-8')
str = str.replace("\u2013", "-") #en dash
str = str.replace("\u2014", "--") #em dash
print (str)

我实际上可以在终端窗口(Mac)中使用以下代码打印输出而不会出现错误,但是我的打印机吐出了3个CP437字符集:

I can actually print the output using the following code without errors in my terminal window (Mac), but my printer spits out sets of 3 CP437 characters:

str = u''.str.encode('utf-8')

我正在使用Sublime Text作为编辑器,并且已经使用UTF-8编码保存了该页面,但是我不确定这会有所帮助.我将不胜感激与此代码的任何帮助.谢谢!

I'm using Sublime Text as my editor, and I've saved the page with UTF-8 encoding, but I'm not sure that will help things. I would greatly appreciate any help with this code. Thank you!

推荐答案

我不完全了解您的代码中发生了什么,但是我也一直尝试用连字符将破折号替换为从网络,这就是对我有用的东西.我的代码就是这样:

I don't fully understand what's happening in your code, but I've also been trying to replace en-dashes with hyphens in a string I got from the Web, and here's what's working for me. My code is just this:

txt = re.sub(u"\u2013", "-", txt)

我正在使用Python 2.7和Sublime Text 2,但是我不想在脚本中设置-*- coding: utf-8 -*-,因为我试图不引入任何新的编码问题. (即使我的变量可能包含Unicode,我也希望将代码保持为纯ASCII.)是否需要在.py文件中包含Unicode,还是只是为了帮助调试?

I'm using Python 2.7 and Sublime Text 2, but I don't bother setting -*- coding: utf-8 -*- in my script, as I'm trying not to introduce any new encoding issues. (Even though my variables may contain Unicode I like to keep my code pure ASCII.) Do you need to include Unicode in your .py file, or was that just to help with debugging?

我会注意到,我的txt变量已经是一个Unicode字符串,即

I'll note that my txt variable is already a unicode string, i.e.

print type(txt)

产生

<type 'unicode'>

我很想知道type(str)在您的情况下会产生什么.

I'd be curious to know what type(str) would produce in your case.

我在您的代码中注意到的一件事是

One thing I noticed in your code is

str = str.replace("\u2013", "-") #en dash

确定要做什么吗?我的理解是\u仅在u""字符串中表示"unicode字符",并且您创建的字符串包含5个字符,"u","2","0"等(第一个字符是因为您可以转义任何字符,并且如果没有特殊含义,例如在'\ n'或'\ t'的情况下,它只会忽略反斜杠.)

Are you sure that does anything? My understanding is that \u only means "unicode character' inside a u"" string, and what you've created there is a string with 5 characters, a "u", a "2", a "0", etc. (The first character is because you can escape any character and if there's no special meaning, like in the case of '\n' or '\t', it just ignores the backslash.)

此外,您从打印机获得3个CP437字符的事实使我怀疑您的字符串中仍然带有短划线.破折号的UTF-8编码为3个字节:0xe2 0x80 0x93.当您对包含一个破折号的Unicode字符串调用str.encode('utf-8')时,您将在返回的字符串中获得这三个字节.我猜您的终端知道如何将其解释为一个破折号,这就是您所看到的.

Also, the fact that you get 3 CP437 characters from your printer makes me suspect that you still have an en-dash in your string. The UTF-8 encoding of an en-dash is 3 bytes: 0xe2 0x80 0x93. When you call str.encode('utf-8') on a unicode string that contains an en-dash you get those three bytes in the returned string. I'm guessing that your terminal knows how to interpret that as an en-dash and that's what you're seeing.

如果您无法使用我的第一种方法,我会提到我在此方面也很成功:

If you can't get my first method to work, I'll mention that I also had success with this:

txt = txt.encode('utf-8')
txt = re.sub("\xe2\x80\x93", "-", txt)

如果您在致电encode()之后放了它,也许re.sub()将为您工作.在这种情况下,您甚至根本不需要对该decode()的调用.我承认我真的不明白为什么它在那里.

Maybe that re.sub() would work for you if you put it after your call to encode(). And in that case you might not even need that call to decode() at all. I'll confess that I really don't understand why it's there.

这篇关于Python:ASCII编解码器无法对破折号进行编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆