BeautifulSoup"encode("utf-8") [英] BeautifulSoup "encode("utf-8")

查看:82
本文介绍了BeautifulSoup"encode("utf-8")的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

from bs4 import BeautifulSoup   
import urllib.request    

link = ('https://mywebsite.org')  
req = urllib.request.Request(link, headers={'User-Agent': 'Mozilla/5.0'})
url = urllib.request.urlopen(req).read()

soup =  BeautifulSoup(url, "html.parser")  
body = soup.find_all('div', {"class":"wrapper"})

print(body)

大家好,我对这段代码有疑问.如果我运行它,就会出现错误

Hi guys, I have a problem with this code. If I run it it come the error

UnicodeEncodeError:'charmap'编解码器无法对位置138中的字符'\ u2022'进行编码:字符映射至

UnicodeEncodeError: 'charmap' codec can't encode character '\u2022' in position 138: character maps to

我尝试搜索,发现必须添加

I tryed to search and I found that I had to add

.encode("utf-8")

.encode("utf-8")

但是如果我添加它,就会出现错误

but if I add it come the error

AttributeError:"ResultSet"对象没有属性"encode"

AttributeError: 'ResultSet' object has no attribute 'encode'

我该如何解决?

对不起我的英语,但我是意大利人:)

I'm sorry for my english but I'm italian :)

推荐答案

您在Windows上,正在尝试打印到控制台. print()引发异常.

You're on Windows and trying to print to the console. The print() is throwing the exception.

Windows控制台本身仅支持8位代码页,因此您所在区域之外的所有内容都会中断(尽管人们对 chcp 65001 怎么说).

The Windows console only natively supports 8bit code pages, so anything outside of your region will break (despite what people say about chcp 65001).

您需要安装并使用 https://github.com/Drekin/win-unicode-控制台.该模块从底层与控制台API进行对话,提供对多字节字符的支持.

You need to install and use https://github.com/Drekin/win-unicode-console. This module talks at a low-level to the console API, giving support for multi-byte characters.

或者,不要打印到控制台,也不要将输出写到使用编码打开的文件中.例如:

Alternatively, don't print to the console and write your output to a file, opened with an encoding. For example:

with open("myoutput.log", "w", encoding="utf-8") as my_log:
    my_log.write(body)

这篇关于BeautifulSoup"encode("utf-8")的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆