从python将结果写到csv文件[UnicodeEncodeError:'charmap'编解码器无法编码字符 [英] Writing out results from python to csv file [UnicodeEncodeError: 'charmap' codec can't encode character

查看:82
本文介绍了从python将结果写到csv文件[UnicodeEncodeError:'charmap'编解码器无法编码字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试编写一个脚本,该脚本可能会从已定义的YouTube视频的评论部分刮下用户名列表,然后将这些用户名粘贴到.csv文件中.

I've been trying to write a script that would potentially scrape the list of usernames off the comments section on a defined YouTube video and paste those usernames onto a .csv file.

这是脚本:

from selenium import webdriver
import time
import csv
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup as soup
driver=webdriver.Chrome()
driver.get('https://www.youtube.com/watch?v=VIDEOURL')
time.sleep(5)
driver.execute_script("window.scrollTo(0, 500)")
time.sleep(3)
html = driver.find_element_by_tag_name('html')
html.send_keys(Keys.END)
time.sleep(5)
scroll_time = 40
for num in range(0, scroll_time):
    html.send_keys(Keys.PAGE_DOWN)
for elem in driver.find_elements_by_xpath('//span[@class="style-scope ytd-comment-renderer"]'):
    print(elem.text)
    with open('usernames.csv', 'w') as f:
        p = csv.writer(f)
        p.writerows(str(elem.text));

它不断抛出第19行的错误:

It keeps throwing out the error for line 19 :

return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u30b9' in position 0: character maps to <undefined>

我会在这里继续阅读,这可能与Windows控制台如何处理unicode有关,并看到了有关下载和安装unicode库软件包的潜在解决方案,但这也无济于事.

I'd read on here that this may have something to do with how windows console deals with unicodes and saw a potential solution about downloading and installing a unicode library package, but that didn't help either.

有人可以帮我弄清楚我在做什么错吗?

Could anyone help me figure out what I'm doing wrong?

PS.我正在使用最新版本的python(3.7).

PS. I'm using the latest version of python (3.7).

非常感谢,谢尔盖(Sergej).

Much appreciated, Sergej.

推荐答案

Python 3 str 值在写入磁盘时需要编码为字节.如果未为文件指定编码,则Python将使用平台默认值.在这种情况下,默认编码无法对'\ u0389'进行编码,因此会引发 UnicodeEncodeError .

Python 3 str values need to be encoded as bytes when written to disk. If no encoding is specified for the file, Python will use the platform default. In this case, the default encoding is unable to encode '\u0389', and so raises a UnicodeEncodeError.

解决方案是在打开文件时将编码指定为UTF-8:

The solution is to specify the encoding as UTF-8 when opening the file:

with open('usernames.csv', 'w', encoding='utf-8') as f:
    p = csv.writer(f)
    ...

由于UTF-8不是您平台的默认编码,因此在打开文件,Python代码或Excel之类的应用程序时,也需要指定编码.

Since UTF-8 isn't your platform's default encoding, you'll need to specify the encoding when opening the file as well, in Python code or in applications like Excel.

Windows支持UTF-8的修改版本,命名为"utf-8-sig".在Python中.此编码在文件的开头插入三个字符,以标识Windows应用程序的文件编码,否则Windows应用程序可能会尝试使用8位编码进行解码.如果该文件仅在Windows计算机上使用,则值得使用此编码.

Windows supports a modified version of UTF-8, named "utf-8-sig" in Python. This encoding inserts three characters at the start of a file to identify the file's encoding to Windows applications which might otherwise attempt to decode using an 8-bit encoding. If the file will be used exclusively on Windows machines then it may be worth using this encoding instead.

with open('usernames.csv', 'w', encoding='utf-8-sig') as f:
    p = csv.writer(f)
    ...

这篇关于从python将结果写到csv文件[UnicodeEncodeError:'charmap'编解码器无法编码字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆