为什么我会收到"UnicodeEncodeError:"charmap"编解码器无法对位置84811中的字符"\ u25b2"进行编码:字符映射为< undefined>";错误? [英] Why I'm getting "UnicodeEncodeError: 'charmap' codec can't encode character '\u25b2' in position 84811: character maps to <undefined>" error?

查看:58
本文介绍了为什么我会收到"UnicodeEncodeError:"charmap"编解码器无法对位置84811中的字符"\ u25b2"进行编码:字符映射为< undefined>";错误?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我收到 UnicodeEncodeError:'charmap'编解码器无法在位置756处对字符'\ u200b'进行编码:运行此代码时,字符映射为错误::

from bs4 import BeautifulSoup
import requests
r = requests.get('https://stackoverflow.com').text
soup = BeautifulSoup(r, 'lxml')
print(soup.prettify())

,输出为:

Traceback (most recent call last):
  File "c:\Users\Asus\Documents\Hello World\Web Scraping\st.py", line 5, in <module>
    print(soup.prettify())
  File "C:\Users\Asus\AppData\Local\Programs\Python\Python38\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u200b' in position 756: character maps to <undefined>

我在vs代码中使用python 3.8.1和UTF-8.该如何解决?

I'm using python 3.8.1 and UTF-8 in vs code. How to solve this?

推荐答案

完整的错误消息中有提示...我将在此处保留最重要的内容:

There are hints in the full error message... I will keep here what seems most important:

Traceback ...
  File "...\cp1252.py", ...
UnicodeEncodeError: 'charmap' codec can't encode character '\u200b' ...

该错误是由 print 调用引起的.文本中的某个位置具有零宽度字符(Unicode U + 200B),并且如果您打印到Windows控制台,则该字符串在内部被编码到Windows控制台代码页中(此处为cp1252).该代码页中未显示零宽度空间.顺便说一句,默认控制台在Windows中并不是真正的unicode友好型.

The error is caused by the print call. Somewhere in you text, you have a ZERO WIDTH SPACE character (Unicode U+200B), and if you print to a Windows console, the string is internally encoded into the Windows console code page (cp1252 here). And the ZERO WIDTH SPACE is not represented in that code page. BTW the default console is not really unicode friendly in Windows.

在Windows控制台中几乎无事可做.我建议您尝试以下解决方法之一:

There is little to do in a Windows console. I would advise you to try one of these workarounds:

  • 不打印到控制台,而是写入(utf8)文件.然后,您将可以使用启用了utf8的文本编辑器(如记事本++)来阅读它

  • do not print to the console but write to a (utf8) file. You will then be able to read it with a utf8 enabled text editor like notepad++

在打印任何内容之前,先使用 errors ='ignore' errors ='replace'对其进行手动编码.这样,可能会冒犯的字符将被忽略,并且不会出现错误

manually encode anything before printing it, with errors='ignore' or errors='replace'. That way, possibly offending characters will be ignored and no error will arise

  print(soup.prettify().encode('cp1252', errors='ignore'))

这篇关于为什么我会收到"UnicodeEncodeError:"charmap"编解码器无法对位置84811中的字符"\ u25b2"进行编码:字符映射为&lt; undefined&gt;";错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆