为什么Python的.decode('cp037')在特定的二进制数组上不起作用? [英] Why is Python's .decode('cp037') not working on specific binary array?

查看:136
本文介绍了为什么Python的.decode('cp037')在特定的二进制数组上不起作用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当打印出DB2查询结果时,我在列'F00002'上获得了以下错误,该列是二进制数组.

When printing out DB2 query results I'm getting the following error on column 'F00002' which is a binary array.

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe3' in position 2: ordinal not in range(128)

我正在使用以下行:

print result[2].decode('cp037')

...就像我在前两列中执行相同代码一样.为什么这在第三列上不起作用,正确的解码/编码是什么?

...just as I do the first two columns where the same code works fine. Why is this not working on the third column and what is the proper decoding/encoding?

推荐答案

请注意,错误是关于编码为ASCII,而不是cp307的解码.但是您并没有要求它在任何地方进行编码,所以为什么会发生这种情况?

Notice that the error is about encoding to ASCII, not about decoding from cp307. But you're not asking it to encode anywhere, so why is this happening?

好吧,实际上可能有两个地方可能出问题了,如果没有您的帮助,我们将不知道是哪个地方.

Well, there are actually two possible places this could go wrong, and we can't know which of them it is without some help from you.

首先,如果您的result[2]已经是unicode对象,则在其上调用decode('cp037')首先将尝试使用通常为'ascii'sys.getdefaultencoding()将该对象encode,以便它具有解码.因此,您不会看到说嘿,伙计,我已经被解码"的错误,而是出现了有关编码为ASCII失败的错误. (这似乎很愚蠢,但对于少数可以解码unicode-> unicodeunicode-> str的编解码器很有用,例如ROT13和quoted-printable.)

First, if your result[2] is already a unicode object, calling decode('cp037') on it will first try to encode it with sys.getdefaultencoding(), which is usually 'ascii', so that it has something to decode. So, instead of getting an error saying "Hey, bozo, I'm already decoded", you get an error about encoding to ASCII failing. (This may seem very silly, but it's useful for a handful of codecs that can decode unicode->unicode or unicode->str, like ROT13 and quoted-printable.)

如果这是您的问题,解决方案是不调用decode.到现在为止,您可能已经在某处对数据进行了解码,因此请勿尝试再次进行此操作. (如果您将其解码为"错误",则需要弄清楚解码的位置并进行修复才能正确执行;在已经错误的情况下重新对其进行解码将无济于事.)

If this is your problem, the solution is to not call decode. You've presumably already decoded the data somewhere along the way to this point, so don't try to do it again. (If you've decoded it wrong, you need to figure out where you decoded it and fix that to do it right; re-decoding it after it's already wrong won't help.)

其次,将Unicode字符串传递到print将自动尝试使用sys.getdefaultencoding()sys.stdout.encoding(取决于您的Python版本)来encode.如果Python无法为您的控制台猜测正确的编码(在Windows上很常见),或者您要将脚本的标准输出重定向到文件而不是打印到控制台(这意味着Python可能无法猜测正确的编码) ,即使在sys.stdout.encoding中,您也可以得到'ascii'.

Second, passing a Unicode string to print will automatically try to encode it with (depending on your Python version) either sys.getdefaultencoding() or sys.stdout.encoding. If Python has failed to guess the right encoding for your console (pretty common on Windows), or if you're redirecting your script's stdout to a file instead of printing to the console (which means Python can't possibly guess the right encoding), you can end up with 'ascii' even in sys.stdout.encoding.

如果这是您的问题,则必须为控制台显式指定正确的编码(如果幸运的话,它位于sys.stdout.encoding中),或您要重定向到的文本文件的编码(可能是'utf-8',但这完全由您决定),并明确地encode您所有的print.

If this is your problem, you have to explicitly specify the right encoding for your console (if you're lucky, it's in sys.stdout.encoding), or the encoding you want for the text file you're redirecting to (probably 'utf-8', but that's up to you), and explicitly encode everything you print.

那么,您怎么知道这是哪一个呢?

So, how do you know which one of these it is?

简单. print type(result[2])并查看它是unicode还是str.或将其分为两部分:x = result[2].decode('cp037'),然后是print x,然后看看两者中的哪一个会加注.或在调试器中运行.您可以使用各种选项进行调试,但是您必须做一些事情.

Simple. print type(result[2]) and see whether it's a unicode or a str. Or break it up into two pieces: x = result[2].decode('cp037') and then print x, and see which of the two raises. Or run in a debugger. You have all kinds of options for debugging this, but you have to do something.

当然,一旦修复了第一个问题,也有可能立即遇到第二个问题.但是现在您知道如何处理了.

Of course it's also possible that, once you fix the first one, you'll immediately run into the second one. But now you know how to deal with that to.

另外,请注意cp037是EBCDIC,Python知道的 ASCII兼容的少数编码之一.实际上,'\xe3'是字母T的EBCDIC.

Also, note that cp037 is EBCDIC, one of the few encodings that Python knows about that isn't ASCII-compatible. In fact, '\xe3' is EBCDIC for the letter T.

这篇关于为什么Python的.decode('cp037')在特定的二进制数组上不起作用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆