无法在Python 2.4中解码unicode字符串 [英] Unable to decode unicode string in Python 2.4

查看：97 发布时间：2020/10/19 19:57:58 python unicode decode

本文介绍了无法在Python 2.4中解码unicode字符串的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是在python 2.4中。这是我的情况。我从数据库中提取了一个字符串，其中包含一个变音了的 o（\xf6）。在这一点上，如果我运行type（value），它将返回str。然后，我尝试运行.decode（'utf-8'），但出现错误（'utf8'编解码器无法解码位置1-4的字节）。

This is in python 2.4. Here is my situation. I pull a string from a database, and it contains an umlauted 'o' (\xf6). At this point if I run type(value) it returns str. I then attempt to run .decode('utf-8'), and I get an error ('utf8' codec can't decode bytes in position 1-4).

真正的目的是成功地使type（value）返回unicode。我发现了之前的问题
有一些有用的信息，但是选择的答案中的示例似乎并不适合我。我在这里做错什么了吗？

Really my goal here is just to successfully make type(value) return unicode. I found an earlier question that had some useful information, but the example from the picked answer doesn't seem to run for me. Is there something I am doing wrong here?

这里有一些代码可以重现：

Here is some code to reproduce:

Name = 'w\xc3\xb6rner'.decode('utf-8')
file.write('Name: %s - %s\n' %(Name, type(Name)))

我从来没有真正进入过write语句，因为它在第一个语句上失败。

I never actually get to the write statement, because it fails on the first statement.

感谢您的帮助。

编辑：

我验证了数据库的字符集是utf8。因此，在要重现的代码中，我将 6xf6更改为 \xc3\xb6，但仍然会失败。 'utf-8'和'utf8'之间有区别吗？

I verified that the DB's charset is utf8. So in my code to reproduce I changed '\xf6' to '\xc3\xb6', and the failure still occurs. Is there a difference between 'utf-8' and 'utf8'?

使用编解码器写入文件的技巧非常方便（我一定会使用它），但是在这种情况下，我仅出于调试目的而写入日志文件。

The tip on using codecs to write to a file is handy (I'll definitely use it), but in this scenario I am only writing to a log file for debugging purposes.

推荐答案

所以在我的代码复制中，我将'\xf6'更改为'\xc3\xb6'，但故障仍然发生

So in my code to reproduce I changed '\xf6' to '\xc3\xb6', and the failure still occurs

不在第一行中的不是：

>>> 'w\xc3\xb6rner'.decode('utf-8')
u'w\xf6rner'

第二行虽然会出错：

>>> file.write('Name: %s - %s\n' %(Name, type(Name)))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 7: ordinal not in range(128)

这完全是您所期望的，尝试写非-ASCII Unicode字符转换为字节流。如果您使用Jiri提出的编解码器包装流的建议，则可以直接编写Unicode，否则必须手动将Unicode字符串重新编码为字节。

Which is entirely what you'd expect, trying to write non-ASCII Unicode characters to a byte stream. If you use Jiri's suggestion of a codecs-wrapped stream you can write Unicode directly, otherwise you will have to re-encode the Unicode string into bytes manually.

记录目的，仅仅是吐出变量的repr（）即可。然后，您不必担心Unicode字符，换行符或其他不需要的字符：

Better, for logging purposes, would be simply to spit out a repr() of the variable. Then you don't have to worry about Unicode characters being in there, or newlines or other unwanted characters:

name= 'w\xc3\xb6rner'.decode('utf-8')
file.write('Name: %r\n' % name)

Name: u'w\xf6rner'

这篇关于无法在Python 2.4中解码unicode字符串的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

无法在Python 2.4中解码unicode字符串 [英] Unable to decode unicode string in Python 2.4

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

无法在Python 2.4中解码unicode字符串 [英] Unable to decode unicode string in Python 2.4

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭