UnicodeEncodeError:'charmap'编解码器无法在位置0编码字符'\ ufeff':字符映射到< undefined> [英] UnicodeEncodeError: 'charmap' codec can't encode character '\ufeff' in position 0: character maps to <undefined>

查看:54
本文介绍了UnicodeEncodeError:'charmap'编解码器无法在位置0编码字符'\ ufeff':字符映射到< undefined>的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 utf-8 编码的应用程序.出于调试目的,我需要打印文本.如果我直接将 print()与包含我的unicode字符串的变量一起使用,则ex- print(pred_str).

I'm working on an application which is using utf-8 encoding. For debugging purposes I need to print the text. If I use print() directly with variable containing my unicode string, ex- print(pred_str).

我收到此错误:

UnicodeEncodeError:'charmap'编解码器无法在位置0编码字符'\ ufeff':字符映射到

UnicodeEncodeError: 'charmap' codec can't encode character '\ufeff' in position 0: character maps to

所以我尝试了 print(pred_str.encode('utf-8')),我的输出看起来像这样:

So I tried print(pred_str.encode('utf-8')) and my output looks like this:

b'\ xef \ xbb \ xbfpudgala-dharma-nair \ xc4 \ x81tmyayo \ xe1 \ xb8 \ xa5 apratipanna-vipratipann \ xc4 \ x81n \ xc4 \ x81m'b'avipar \ xc4 \ xabta-pudgala-dharma-nair \ xc4 \ x81tmya-pratip \ xc4 \ x81dana-artham'b'tri \ xe1 \ xb9 \ x83 \ xc5 \ x9bik \ xc4 \ x81-vij \ xc3 \ xb1apti-prakara \ xe1 \ xb9 \ x87a- \ xc4 \ x81rambha \ xe1 \ xb8 \ xa5'b'pudgala-dharma-air \ xc4 \ x81tmya-pratip \ xc4 \ x81danam punar kle \ xc5 \ x9ba-j \ xc3 \ xb1eya- \ xc4 \ x81vara \ xe1 \ xb9 \ x87a-prah \ xc4 \ x81 \ xe1 \\ x87a-artham'

但是,我希望我的输出看起来像这样:

But, I want my output to look like this:

pudgala-dharma-nairātmyayoḥapratipanna-vipratipannānām阿维帕里塔-普达加拉-佛法-奈拉特米-普拉蒂达达纳-阿瑟姆triṃśikā-vijñapti-prakaraṇa-ārambhaḥpudgala-dharma-nairātmya-pratipādanampunarkleśa-jñeya-āvaraṇa-prahāṇa-artham

如果我使用以下方式将字符串保存在文件中:

If i save my string in file using:

with codecs.open('out.txt', 'w', 'UTF-8') as f:
    f.write(pred_str)

它按预期方式保存了字符串.

it saves string as expected.

推荐答案

您的数据使用"UTF-8-SIG"编解码器编码,有时在Microsoft环境中使用.

Your data is encoded with the "UTF-8-SIG" codec, which is sometimes used in Microsoft environments.

此UTF-8变体在编码文本前加上字节顺序标记 '\ xef \ xbb \ xbf',使应用程序更容易检测UTF-8编码的文本和其他编码.

This variant of UTF-8 prefixes encoded text with a byte order mark '\xef\xbb\xbf', to make it easier for applications to detect UTF-8 encoded text vs other encodings.

您可以像这样解码这样的字节串:

You can decode such bytestrings like this:

>>> bs = b'\xef\xbb\xbfpudgala-dharma-nair\xc4\x81tmyayo\xe1\xb8\xa5 apratipanna-vipratipann\xc4\x81n\xc4\x81m'
>>> text = bs.decode('utf-8-sig')
>>> print(text)                                                                                                         
pudgala-dharma-nairātmyayoḥ apratipanna-vipratipannānām 

要从文件中读取此类数据:

To read such data from a file:

with open('myfile.txt', 'r', encoding='utf-8-sig') as f:
    text = f.read()

请注意,即使从UTF-8-SIG解码后,您仍可能无法打印数据,因为控制台的默认代码页可能无法对数据中的其他非ASCII字符进行编码.在这种情况下,您需要调整控制台设置以支持UTF-8.

Note that even after decoding from UTF-8-SIG, you may still be unable to print your data because your console's default code page may not be able to encode other non-ascii characters in the data. In that case you will need to adjust your console settings to support UTF-8.

这篇关于UnicodeEncodeError:'charmap'编解码器无法在位置0编码字符'\ ufeff':字符映射到< undefined>的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆