无法在Python 2.4中解码unicode字符串 [英] Unable to decode unicode string in Python 2.4

查看:97
本文介绍了无法在Python 2.4中解码unicode字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是在python 2.4中。这是我的情况。我从数据库中提取了一个字符串,其中包含一个变音了的 o(\xf6)。在这一点上,如果我运行type(value),它将返回str。然后,我尝试运行.decode('utf-8'),但出现错误('utf8'编解码器无法解码位置1-4的字节)。

This is in python 2.4. Here is my situation. I pull a string from a database, and it contains an umlauted 'o' (\xf6). At this point if I run type(value) it returns str. I then attempt to run .decode('utf-8'), and I get an error ('utf8' codec can't decode bytes in position 1-4).

真正的目的是成功地使type(value)返回unicode。我发现了之前的问题
有一些有用的信息,但是选择的答案中的示例似乎并不适合我。我在这里做错什么了吗?

Really my goal here is just to successfully make type(value) return unicode. I found an earlier question that had some useful information, but the example from the picked answer doesn't seem to run for me. Is there something I am doing wrong here?

这里有一些代码可以重现:

Here is some code to reproduce:

Name = 'w\xc3\xb6rner'.decode('utf-8')
file.write('Name: %s - %s\n' %(Name, type(Name)))

我从来没有真正进入过write语句,因为它在第一个语句上失败。

I never actually get to the write statement, because it fails on the first statement.

感谢您的帮助。

编辑:

我验证了数据库的字符集是utf8。因此,在要重现的代码中,我将 6xf6更改为 \xc3\xb6,但仍然会失败。 'utf-8'和'utf8'之间有区别吗?

I verified that the DB's charset is utf8. So in my code to reproduce I changed '\xf6' to '\xc3\xb6', and the failure still occurs. Is there a difference between 'utf-8' and 'utf8'?

使用编解码器写入文件的技巧非常方便(我一定会使用它) ,但是在这种情况下,我仅出于调试目的而写入日志文件。

The tip on using codecs to write to a file is handy (I'll definitely use it), but in this scenario I am only writing to a log file for debugging purposes.

推荐答案


所以在我的代码复制中,我将'\xf6'更改为'\xc3\xb6',但故障仍然发生

So in my code to reproduce I changed '\xf6' to '\xc3\xb6', and the failure still occurs

不在第一行中的不是:

>>> 'w\xc3\xb6rner'.decode('utf-8')
u'w\xf6rner'

第二行虽然会出错:

>>> file.write('Name: %s - %s\n' %(Name, type(Name)))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 7: ordinal not in range(128)

这完全是您所期望的,尝试写非-ASCII Unicode字符转换为字节流。如果您使用Jiri提出的编解码器包装流的建议,则可以直接编写Unicode,否则必须手动将Unicode字符串重新编码为字节。

Which is entirely what you'd expect, trying to write non-ASCII Unicode characters to a byte stream. If you use Jiri's suggestion of a codecs-wrapped stream you can write Unicode directly, otherwise you will have to re-encode the Unicode string into bytes manually.

记录目的,仅仅是吐出变量的repr()即可。然后,您不必担心Unicode字符,换行符或其他不需要的字符:

Better, for logging purposes, would be simply to spit out a repr() of the variable. Then you don't have to worry about Unicode characters being in there, or newlines or other unwanted characters:

name= 'w\xc3\xb6rner'.decode('utf-8')
file.write('Name: %r\n' % name)

Name: u'w\xf6rner'

这篇关于无法在Python 2.4中解码unicode字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆