UnicodeEncodeError:'ascii'编解码器无法对打印功能中的字符进行编码 [英] UnicodeEncodeError: 'ascii' codec can't encode character in print function

查看:93
本文介绍了UnicodeEncodeError:'ascii'编解码器无法对打印功能中的字符进行编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的公司正在使用一个数据库,而我正在编写一个与该数据库交互的脚本.已经存在用于将查询放在数据库中的脚本,并且基于查询,该脚本将从数据库返回结果.

我在UNIX环境上工作,我在脚本中使用该脚本从数据库中获取一些数据,并将查询的结果重定向到文件.现在,当我尝试读取此文件时,我会收到一条错误消息-

UnicodeEncodeError: 'ascii' codec can't encode character '\u2013' in position 9741: ordinal not in range(128)

我知道python由于文件的编码而无法读取文件.该文件的编码不是ascii,这就是错误即将来临的原因.我尝试检查文件的编码,并尝试使用其自己的编码读取文件.

我正在使用的代码是-

 os.system("Query.pl \"select title from bug where (ste='KGF-A' AND ( status = 'Not_Approved')) \">patchlet.txt")
 encoding_dict3={}
 encoding_dict3=chardet.detect(open("patchlet.txt", "rb").read())
 print(encoding_dict3)
# Open the patchlet.txt file for storing the last part of titles for latest ACF in a list
 with codecs.open("patchlet.txt",encoding='{}'.format(encoding_dict3['encoding'])) as csvFile
readCSV = csv.reader(csvFile,delimiter=":")
    for row in readCSV:
        if len(row)!=0:
            if len(row) > 1:
                j=len(row)-1
                patchlets_in_latest.append(row[j])
            elif len(row) ==1:
                patchlets_in_latest.append(row[0])               
patchlets_in_latest_list=[]
# calling the strip_list_noempty function for removing newline and whitespace characters
patchlets_in_latest_list=strip_list_noempty(patchlets_in_latest)
# coverting list of titles in set to remove any duplicate entry if present
patchlets_in_latest_set= set(patchlets_in_latest_list)
# Finding duplicate entries in  list
duplicates_in_latest=[k for k,v in Counter(patchlets_in_latest_list).items() if v>1]
# Printing imp info for logs
    print("list of titles of patchlets in latest list are : ")
for i in patchlets_in_latest_list:
   **print(str(i))**
print("No of patchlets in latest list are : {}".format(str(len(patchlets_in_latest_list))))

其中Query.pl是为从数据库引入查询结果而编写的perl脚本. (用于存储HSD结果的文件)为:

{'encoding': 'Windows-1252', 'confidence': 0.73, 'language': ''}

即使我提供了用于读取文件的相同编码,也出现了错误.

请帮助我解决此错误.

我正在使用python3.6

在输出结果时,我得到了错误,并且文件中有一行包含一些未知字符.该行如下所示:

由于无法同时使用某些跟踪而导致某些故障.

我正在使用gvim,在gvim中,"vtrace"看起来像〜Vvtrace" .然后,我手动在数据库中检查了该字符,并且该字符是"–".根据我的键盘,它既不是连字符也不是下划线.这些字符正在造成问题.

我也在Linux环境上工作.

我添加了更多有助于跟踪错误的代码.另外,我还强调了印刷"字样.出现错误的语句(print(str(i))).

解决方案

问题

基于问题中的信息,程序正在处理非ASCII输入数据,但无法输出非ASCII数据.

具体来说,此代码:

for i in patchlets_in_latest_list:
   print(str(i))

导致此异常的结果:

UnicodeEncodeError:'ascii'编解码器无法编码字符'\ u2013'

此行为在Python2中很常见,在unicode对象上调用str会导致Python尝试将对象编码为ASCII,如果对象包含非ASCII字符,则会导致UnicodeEncodeError.

在Python3中,在str实例上调用str不会触发任何编码.但是,在str上调用print函数会将str编码为sys.stdout.encoding. sys.stdout.encoding默认为 locale.getpreferredencoding 返回的值.通常,这将是您的Linux用户的LANG环境变量.

解决方案

如果我们假设您的程序未取代正常的编码行为,则应通过确保代码由UTF-8语言环境中的Python3解释器执行来解决此问题.

  • 确定 100%,确保代码正在由Python3解释器执行-在程序中从打印sys.version_info.
  • 尝试设置 PYTHONIOENCODING 环境运行脚本时的变量:PYTHONIOENCODING=UTF-8 python3 myscript.py
  • 在终端(或echo $LANG)中使用locale命令检查您的语言环境.如果它不是以UTF-8结尾,请考虑对其进行更改.如果您在公司计算机上,请咨询系统管理员.
  • 如果您的代码在cron作业中运行,请记住,除非明确设置了语言环境,否则cron作业通常以'C'或'POSIX'语言环境(可能使用ASCII编码)运行.同样,如果脚本在其他用户下运行,请检查其语言环境设置.

解决方法

如果更改环境不可行,则可以通过使用错误处理程序编码为ASCII,然后解码回str来解决Python中的问题.

在您的特定情况下,有四个有用的错误处理程序,它们的效果通过以下代码演示:

>>> s = 'Hello \u2013 World'
>>> s
'Hello – World'
>>> handlers = ['ignore', 'replace', 'xmlcharrefreplace', 'namereplace']
>>> print(str(s))
Hello – World
>>> for h in handlers:
...     print(f'Handler: {h}:', s.encode('ascii', errors=h).decode('ascii'))
... 
Handler: ignore: Hello  World
Handler: replace: Hello ? World
Handler: xmlcharrefreplace: Hello – World
Handler: namereplace: Hello \N{EN DASH} World

ignore replace 处理程序会丢失信息-您无法确定空格或问号已替换了哪个字符.

xmlcharrefreplace namereplace 处理程序不会丢失信息,但是替换序列可能会使文本对人类的可读性降低.

由您决定哪种折衷方案对于程序输出的使用者是可接受的.

如果您决定使用 replace 处理程序,则可以这样更改代码:

for i in patchlets_in_latest_list:
    replaced = i.encode('ascii', errors='replace').decode('ascii')
    print(replaced)

无论您在哪里打印可能包含非ASCII字符的数据.

My company is using a database and I am writing a script that interacts with that database. There is already an script for putting the query on database and based on the query that script will return results from database.

I am working on unix environment and I am using that script in my script for getting some data from database and I am redirecting the result from the query to a file. Now when I try to read this file then I am getting an error saying-

UnicodeEncodeError: 'ascii' codec can't encode character '\u2013' in position 9741: ordinal not in range(128)

I know that python is not able to read file because of the encoding of the file. The encoding of the file is not ascii that's why the error is coming. I tried checking the encoding of the file and tried reading the file with its own encoding.

The code that I am using is-

 os.system("Query.pl \"select title from bug where (ste='KGF-A' AND ( status = 'Not_Approved')) \">patchlet.txt")
 encoding_dict3={}
 encoding_dict3=chardet.detect(open("patchlet.txt", "rb").read())
 print(encoding_dict3)
# Open the patchlet.txt file for storing the last part of titles for latest ACF in a list
 with codecs.open("patchlet.txt",encoding='{}'.format(encoding_dict3['encoding'])) as csvFile
readCSV = csv.reader(csvFile,delimiter=":")
    for row in readCSV:
        if len(row)!=0:
            if len(row) > 1:
                j=len(row)-1
                patchlets_in_latest.append(row[j])
            elif len(row) ==1:
                patchlets_in_latest.append(row[0])               
patchlets_in_latest_list=[]
# calling the strip_list_noempty function for removing newline and whitespace characters
patchlets_in_latest_list=strip_list_noempty(patchlets_in_latest)
# coverting list of titles in set to remove any duplicate entry if present
patchlets_in_latest_set= set(patchlets_in_latest_list)
# Finding duplicate entries in  list
duplicates_in_latest=[k for k,v in Counter(patchlets_in_latest_list).items() if v>1]
# Printing imp info for logs
    print("list of titles of patchlets in latest list are : ")
for i in patchlets_in_latest_list:
   **print(str(i))**
print("No of patchlets in latest list are : {}".format(str(len(patchlets_in_latest_list))))

Where Query.pl is the perl script that is written to bring in the result of query from database.The encoding that I am getting for "patchlet.txt" (the file used for storing result from HSD) is:

{'encoding': 'Windows-1252', 'confidence': 0.73, 'language': ''}

Even when I have provided the same encoding for reading the file, then also I am getting the error.

Please help me in resolving this error.

EDIT: I am using python3.6

EDIT2:

While outputting the result I am getting the error and there is one line in the file which is having some unknown character. The line looks like:

Some failure because of which vtrace cannot be used along with some trace.

I am using gvim and in gvim the "vtrace" looks like "~Vvtrace" . Then I checked on database manually for this character and the character is "–" which is according to my keyboard is neither hyphen nor underscore.These kinds of characters are creating the problem.

Also I am working on linux environment.

EDIT 3: I have added more code that can help in tracing the error. Also I have highlighted a "print" statement (print(str(i))) where I am getting the error.

解决方案

Problem

Based on the information in the question, the program is processing non-ASCII input data, but is unable to output non-ASCII data.

Specifically, this code:

for i in patchlets_in_latest_list:
   print(str(i))

Results in this exception:

UnicodeEncodeError: 'ascii' codec can't encode character '\u2013'

This behaviour was common in Python2, where calling str on a unicode object would cause Python to try to encode the object as ASCII, resulting in a UnicodeEncodeError if the object contained non-ASCII characters.

In Python3, calling str on a str instance doesn't trigger any encoding. However calling the print function on a str will encode the str to sys.stdout.encoding. sys.stdout.encoding defaults to that returned by locale.getpreferredencoding. This will generally be your linux user's LANG environment variable.

Solution

If we assume that your program is not overriding normal encoding behaviour, the problem should be fixed by ensuring that the code is being executed by a Python3 interpreter in a UTF-8 locale.

  • be 100% certain that the code is being executed by a Python3 interpreter - print sys.version_info from within the program.
  • try setting the PYTHONIOENCODING environment variable when running your script: PYTHONIOENCODING=UTF-8 python3 myscript.py
  • check your locale using the locale command in the terminal (or echo $LANG). If it doesn't end in UTF-8, consider changing it. Consult your system administrators if you are on a corporate machine.
  • if your code runs in a cron job, bear in mind that cron jobs often run with the 'C' or 'POSIX' locale - which could be using ASCII encoding - unless a locale is explicitly set. Likewise if the script is run under a different user, check their locale settings.

Workaround

If changing the environment is not feasible, you can workaround the problem in Python by encoding to ASCII with an error handler, then decoding back to str.

There are four useful error handlers in your particular situation, their effects are demonstrated with this code:

>>> s = 'Hello \u2013 World'
>>> s
'Hello – World'
>>> handlers = ['ignore', 'replace', 'xmlcharrefreplace', 'namereplace']
>>> print(str(s))
Hello – World
>>> for h in handlers:
...     print(f'Handler: {h}:', s.encode('ascii', errors=h).decode('ascii'))
... 
Handler: ignore: Hello  World
Handler: replace: Hello ? World
Handler: xmlcharrefreplace: Hello – World
Handler: namereplace: Hello \N{EN DASH} World

The ignore and replace handlers lose information - you can't tell what character has been replaced with an space or question mark.

The xmlcharrefreplace and namereplace handlers do not lose information, but the replacement sequences may make the text less readable to humans.

It's up to you to decide which tradeoff is acceptable for the consumers of your program's output.

If you decided to use the replace handler, you would change your code like this:

for i in patchlets_in_latest_list:
    replaced = i.encode('ascii', errors='replace').decode('ascii')
    print(replaced)

wherever you are printing data that might contain non-ASCII characters.

这篇关于UnicodeEncodeError:'ascii'编解码器无法对打印功能中的字符进行编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆