BeautifulSoup"encode("utf-8") [英] BeautifulSoup "encode("utf-8")
问题描述
from bs4 import BeautifulSoup
import urllib.request
link = ('https://mywebsite.org')
req = urllib.request.Request(link, headers={'User-Agent': 'Mozilla/5.0'})
url = urllib.request.urlopen(req).read()
soup = BeautifulSoup(url, "html.parser")
body = soup.find_all('div', {"class":"wrapper"})
print(body)
大家好,我对这段代码有疑问.如果我运行它,就会出现错误
Hi guys, I have a problem with this code. If I run it it come the error
UnicodeEncodeError:'charmap'编解码器无法对位置138中的字符'\ u2022'进行编码:字符映射至
UnicodeEncodeError: 'charmap' codec can't encode character '\u2022' in position 138: character maps to
我尝试搜索,发现必须添加
I tryed to search and I found that I had to add
.encode("utf-8")
.encode("utf-8")
但是如果我添加它,就会出现错误
but if I add it come the error
AttributeError:"ResultSet"对象没有属性"encode"
AttributeError: 'ResultSet' object has no attribute 'encode'
我该如何解决?
对不起我的英语,但我是意大利人:)
I'm sorry for my english but I'm italian :)
推荐答案
您在Windows上,正在尝试打印到控制台. print()
引发异常.
You're on Windows and trying to print to the console. The print()
is throwing the exception.
Windows控制台本身仅支持8位代码页,因此您所在区域之外的所有内容都会中断(尽管人们对 chcp 65001
怎么说).
The Windows console only natively supports 8bit code pages, so anything outside of your region will break (despite what people say about chcp 65001
).
您需要安装并使用 https://github.com/Drekin/win-unicode-控制台.该模块从底层与控制台API进行对话,提供对多字节字符的支持.
You need to install and use https://github.com/Drekin/win-unicode-console. This module talks at a low-level to the console API, giving support for multi-byte characters.
或者,不要打印到控制台,也不要将输出写到使用编码打开的文件中.例如:
Alternatively, don't print to the console and write your output to a file, opened with an encoding. For example:
with open("myoutput.log", "w", encoding="utf-8") as my_log:
my_log.write(body)
这篇关于BeautifulSoup"encode("utf-8")的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!