UnicodeEncodeError: 'ascii' 编解码器无法在打印功能中编码字符 [英] UnicodeEncodeError: 'ascii' codec can't encode character in print function

查看:34
本文介绍了UnicodeEncodeError: 'ascii' 编解码器无法在打印功能中编码字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的公司正在使用一个数据库,我正在编写一个与该数据库交互的脚本.已经有一个将查询放在数据库上的脚本,并基于该脚本将从数据库返回结果的查询.

我在 unix 环境中工作,我在脚本中使用该脚本从数据库获取一些数据,并将查询结果重定向到文件.现在,当我尝试读取此文件时,我收到一条错误消息-

UnicodeEncodeError: 'ascii' codec can't encode character 'u2013' in position 9741: ordinal not in range(128)

我知道由于文件的编码,python 无法读取文件.文件的编码不是 ascii 这就是错误即将到来的原因.我尝试检查文件的编码并尝试使用自己的编码读取文件.

我使用的代码是-

 os.system("Query.pl "select title from bug where (ste='KGF-A' AND ( status = 'Not_Approved')) ">patchlet.txt")encoding_dict3={}encoding_dict3=chardet.detect(open("patchlet.txt", "rb").read())打印(encoding_dict3)# 打开patchlet.txt 文件,用于将最新ACF 的标题的最后一部分存储在列表中with codecs.open("patchlet.txt",encoding='{}'.format(encoding_dict3['encoding'])) 作为 csvFilereadCSV = csv.reader(csvFile,delimiter=":")对于 readCSV 中的行:如果 len(row)!=0:如果 len(row) >1:j=len(row)-1patchlets_in_latest.append(row[j])elif len(行)==1:patchlets_in_latest.append(row[0])patchlets_in_latest_list=[]# 调用strip_list_noempty 函数删除换行符和空白字符patchlets_in_latest_list=strip_list_noempty(patchlets_in_latest)# 覆盖集合中的标题列表以删除任何重复条目(如果存在)patchlets_in_latest_set= 设置(patchlets_in_latest_list)# 查找列表中的重复条目duplicates_in_latest=[k for k,v in Counter(patchlets_in_latest_list).items() if v>1]# 打印日志的imp信息打印(最新列表中补丁的标题列表是:")对于 patchlets_in_latest_list 中的 i:**打印(str(i))**打印(最新列表中的补丁数量为:{}".format(str(len(patchlets_in_latest_list))))

其中 Query.pl 是 perl 脚本,用于从数据库中引入查询结果.我为patchlet.txt"获得的编码;(用于存储HSD结果的文件)是:

{'encoding': 'Windows-1252', 'confidence': 0.73, 'language': ''}

即使我为读取文件提供了相同的编码,我也会收到错误.

请帮助我解决此错误.

我使用的是python3.6

输出结果时出现错误,文件中有一行包含未知字符.该行看起来像:

<块引用>

一些失败,因为 vtrace 不能与一些跟踪一起使用.

我正在使用 gvim 并且在 gvim 中使用vtrace"看起来像~Vvtrace".然后我手动检查了这个字符的数据库,字符是-";这是根据我的键盘既不是连字符也不是下划线.这些类型的字符造成了问题.

我也在 linux 环境下工作.

编辑 3:我添加了更多有助于跟踪错误的代码.我还强调了一个打印"语句 (print(str(i))) 我在哪里得到错误.

解决方案

问题

根据问题中的信息,程序正在处理非ASCII输入数据,但无法输出非ASCII数据.

具体来说,这段代码:

for i in patchlets_in_latest_list:打印(str(i))

导致此异常:

<块引用>

UnicodeEncodeError: 'ascii' 编解码器无法编码字符 'u2013'

这种行为在 Python2 中很常见,在 unicode 对象上调用 str 会导致 Python 尝试将对象编码为 ASCII,从而导致 UnicodeEncodeError 如果对象包含非 ASCII 字符.

在 Python3 中,在 str 实例上调用 str 不会触发任何编码.但是,在 str 上调用 print 函数会将 str 编码为 sys.stdout.encoding.sys.stdout.encoding 默认为由 locale.getpreferredencoding.这通常是您的 linux 用户的 LANG 环境变量.

解决方案

如果我们假设您的程序没有覆盖正常的编码行为,那么应该通过确保代码由 Python3 解释器在 UTF-8 语言环境中执行来解决问题.

  • 100%确定代码正在由 Python3 解释器执行 - 从程序中打印sys.version_info.莉>
  • 尝试设置 PYTHONIOENCODING 环境运行脚本时的变量:PYTHONIOENCODING=UTF-8 python3 myscript.py
  • 使用终端中的 locale 命令(或 echo $LANG)检查您的语言环境.如果它不是以 UTF-8 结尾,请考虑更改它.如果您使用的是公司计算机,请咨询您的系统管理员.
  • 如果您的代码在 cron 作业中运行,请记住,cron 作业通常使用C"或POSIX"语言环境运行——这可能使用 ASCII 编码——除非明确设置了语言环境.同样,如果脚本在不同的用户下运行,请检查他们的区域设置.

解决方法

如果改变环境不可行,您可以通过使用错误处理程序编码为 ASCII,然后解码回 str 来解决 Python 中的问题.

在您的特定情况下有四个有用的错误处理程序,它们的效果用以下代码演示:

<预><代码>>>>s = '你好 u2013 世界'>>>秒'你好世界'>>>handlers = ['忽略','替换','xmlcharrefreplace','namereplace']>>>打印(字符串)你好世界>>>对于处理程序中的 h:... print(f'Handler: {h}:', s.encode('ascii', errors=h).decode('ascii'))...处理程序:忽略:Hello World处理程序:替换:你好?世界处理程序:xmlcharrefreplace:你好 &#8211;世界处理程序:namereplace:Hello N{EN DASH} World

ignorereplace 处理程序丢失信息 - 您无法判断哪个字符已被空格或问号替换.

xmlcharrefreplacenamereplace 处理程序不会丢失信息,但替换序列可能会降低文本对人类的可读性.

由您决定程序输出的使用者可以接受哪种权衡.

如果您决定使用 replace 处理程序,您可以像这样更改代码:

for i in patchlets_in_latest_list:替换 = i.encode('ascii', errors='replace').decode('ascii')打印(替换)

无论您在哪里打印可能包含非 ASCII 字符的数据.

My company is using a database and I am writing a script that interacts with that database. There is already an script for putting the query on database and based on the query that script will return results from database.

I am working on unix environment and I am using that script in my script for getting some data from database and I am redirecting the result from the query to a file. Now when I try to read this file then I am getting an error saying-

UnicodeEncodeError: 'ascii' codec can't encode character 'u2013' in position 9741: ordinal not in range(128)

I know that python is not able to read file because of the encoding of the file. The encoding of the file is not ascii that's why the error is coming. I tried checking the encoding of the file and tried reading the file with its own encoding.

The code that I am using is-

 os.system("Query.pl "select title from bug where (ste='KGF-A' AND ( status = 'Not_Approved')) ">patchlet.txt")
 encoding_dict3={}
 encoding_dict3=chardet.detect(open("patchlet.txt", "rb").read())
 print(encoding_dict3)
# Open the patchlet.txt file for storing the last part of titles for latest ACF in a list
 with codecs.open("patchlet.txt",encoding='{}'.format(encoding_dict3['encoding'])) as csvFile
readCSV = csv.reader(csvFile,delimiter=":")
    for row in readCSV:
        if len(row)!=0:
            if len(row) > 1:
                j=len(row)-1
                patchlets_in_latest.append(row[j])
            elif len(row) ==1:
                patchlets_in_latest.append(row[0])               
patchlets_in_latest_list=[]
# calling the strip_list_noempty function for removing newline and whitespace characters
patchlets_in_latest_list=strip_list_noempty(patchlets_in_latest)
# coverting list of titles in set to remove any duplicate entry if present
patchlets_in_latest_set= set(patchlets_in_latest_list)
# Finding duplicate entries in  list
duplicates_in_latest=[k for k,v in Counter(patchlets_in_latest_list).items() if v>1]
# Printing imp info for logs
    print("list of titles of patchlets in latest list are : ")
for i in patchlets_in_latest_list:
   **print(str(i))**
print("No of patchlets in latest list are : {}".format(str(len(patchlets_in_latest_list))))

Where Query.pl is the perl script that is written to bring in the result of query from database.The encoding that I am getting for "patchlet.txt" (the file used for storing result from HSD) is:

{'encoding': 'Windows-1252', 'confidence': 0.73, 'language': ''}

Even when I have provided the same encoding for reading the file, then also I am getting the error.

Please help me in resolving this error.

EDIT: I am using python3.6

EDIT2:

While outputting the result I am getting the error and there is one line in the file which is having some unknown character. The line looks like:

Some failure because of which vtrace cannot be used along with some trace.

I am using gvim and in gvim the "vtrace" looks like "~Vvtrace" . Then I checked on database manually for this character and the character is "–" which is according to my keyboard is neither hyphen nor underscore.These kinds of characters are creating the problem.

Also I am working on linux environment.

EDIT 3: I have added more code that can help in tracing the error. Also I have highlighted a "print" statement (print(str(i))) where I am getting the error.

解决方案

Problem

Based on the information in the question, the program is processing non-ASCII input data, but is unable to output non-ASCII data.

Specifically, this code:

for i in patchlets_in_latest_list:
   print(str(i))

Results in this exception:

UnicodeEncodeError: 'ascii' codec can't encode character 'u2013'

This behaviour was common in Python2, where calling str on a unicode object would cause Python to try to encode the object as ASCII, resulting in a UnicodeEncodeError if the object contained non-ASCII characters.

In Python3, calling str on a str instance doesn't trigger any encoding. However calling the print function on a str will encode the str to sys.stdout.encoding. sys.stdout.encoding defaults to that returned by locale.getpreferredencoding. This will generally be your linux user's LANG environment variable.

Solution

If we assume that your program is not overriding normal encoding behaviour, the problem should be fixed by ensuring that the code is being executed by a Python3 interpreter in a UTF-8 locale.

  • be 100% certain that the code is being executed by a Python3 interpreter - print sys.version_info from within the program.
  • try setting the PYTHONIOENCODING environment variable when running your script: PYTHONIOENCODING=UTF-8 python3 myscript.py
  • check your locale using the locale command in the terminal (or echo $LANG). If it doesn't end in UTF-8, consider changing it. Consult your system administrators if you are on a corporate machine.
  • if your code runs in a cron job, bear in mind that cron jobs often run with the 'C' or 'POSIX' locale - which could be using ASCII encoding - unless a locale is explicitly set. Likewise if the script is run under a different user, check their locale settings.

Workaround

If changing the environment is not feasible, you can workaround the problem in Python by encoding to ASCII with an error handler, then decoding back to str.

There are four useful error handlers in your particular situation, their effects are demonstrated with this code:

>>> s = 'Hello u2013 World'
>>> s
'Hello – World'
>>> handlers = ['ignore', 'replace', 'xmlcharrefreplace', 'namereplace']
>>> print(str(s))
Hello – World
>>> for h in handlers:
...     print(f'Handler: {h}:', s.encode('ascii', errors=h).decode('ascii'))
... 
Handler: ignore: Hello  World
Handler: replace: Hello ? World
Handler: xmlcharrefreplace: Hello &#8211; World
Handler: namereplace: Hello N{EN DASH} World

The ignore and replace handlers lose information - you can't tell what character has been replaced with an space or question mark.

The xmlcharrefreplace and namereplace handlers do not lose information, but the replacement sequences may make the text less readable to humans.

It's up to you to decide which tradeoff is acceptable for the consumers of your program's output.

If you decided to use the replace handler, you would change your code like this:

for i in patchlets_in_latest_list:
    replaced = i.encode('ascii', errors='replace').decode('ascii')
    print(replaced)

wherever you are printing data that might contain non-ASCII characters.

这篇关于UnicodeEncodeError: 'ascii' 编解码器无法在打印功能中编码字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆