UNI $ C $岑codeError:“CP949”codeC不能连接90位code字符“\\ u20a9':非法多字节序列 [英] UnicodeEncodeError: 'cp949' codec can't encode character '\u20a9' in position 90: illegal multibyte sequence

查看:920
本文介绍了UNI $ C $岑codeError:“CP949”codeC不能连接90位code字符“\\ u20a9':非法多字节序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是一个Python初学者。

我试图抓取谷歌Play商店,并出口到CSV文件。
但是,我得到了一个错误信息。

 的Uni $ C $岑codeError:CP949codeC不能连接90位code字符\\ u20a9':非法多字节序列

下面是我的源$ C ​​$ C。

当我打印命令,它的工作原理。
但它显示错误信息导出到CSV文件时

请帮我

 从BS4进口BeautifulSoup
进口urllib.request里
进口的urllib.parse
进口codeCS
进口JSON
进口泡菜
从日期时间日期时间进口
进口SYS
导入CSV
进口OS
REQ ='https://play.google.com/store/search?q=hana&c=apps&num=300响应= urllib.request.urlopen(REQ)
the_page = response.read()
汤= BeautifulSoup(the_page)
#app_link = soup.find('A',{'类':'标题'})
#app_url = app_link.get('href属性)在soup.findAll('格',{'类':'细节'})DIV:
    标题= div.find('A',{'类':'标题'})
    #PRINT(title.get(HREF'))
    APP_URL = title.get('href属性)    app_details = {}
    g_app_url ='https://play.google.com'+ APP_URL    app_response = urllib.request.urlopen(g_app_url)
    app_page = app_response.read()
    汤= BeautifulSoup(app_page)
    #PRINT(汤)
    #PRINT(g_app_url)
    title_div = soup.find('格',{类:文件标题'})
    app_details ['标题'] = title_div.find('格').get_text()条()    字幕= soup.find('A',{类:文件字幕主'})
    app_details ['开发商'] = subtitle.get_text()条()
    app_details ['developer_link'] = subtitle.get('的href').strip()    price_buy_span = soup.find('跨',{'类':'买入价'})
    价格= price_buy_span.find_all('跨')[ - 1]。.get_text()条()
    价格=价格[: - 4] .strip()如果价格='安装'别人'自由'!
    app_details ['价格'] =价格    rating_value_meta = soup.find('元',{'itemprop':'ratingValue'})
    app_details ['评级'] = rating_value_meta.get(内容).strip()    reviewers_count_meta = soup.find('元',{'itemprop':'ratingCount'})
    app_details ['评审'] = reviewers_count_meta.get(内容).strip()    num_downloads_div = soup.find('格',{'itemprop':'NUMDOWNLOADS'})
    如果num_downloads_div:app_details ['下载'] = num_downloads_div.get_text()条()。    date_published_div = soup.find('格',{'itemprop':'datePublished'})
    app_details ['date_published'] = date_published_div.get_text()条()    operating_systems_div = soup.find('格',{'itemprop':'operatingSystems'})
    app_details ['OPERATING_SYSTEM'] = operating_systems_div.get_text()条()    content_rating_div = soup.find('格',{'itemprop':'contentRating'})
    app_details ['CONTENT_RATING'] = content_rating_div.get_text()条()    category_span = soup.find('跨',{'itemprop':'风格'})
    app_details ['类'] = category_span.get_text()
    #PRINT(app_details)
    开放('result.csv','W')为f:#只需要使用3.x的W模式
        W = csv.DictWriter(F,app_details.keys())
        w.writeheader()
        w.writerow(app_details)


解决方案

Python 3中打开的区域设置默认编码的文本文件;如果该编码不能处理你试图写它的统一code值,选择不同的codeC:

 开放('result.csv','W',编码='UTF-8',换行符='')为f:

这会带code任何单code字符串为UTF-8,而不是,它可以处理所有的Uni code标准的编码。

注意, CSV 模块使用换行=''来prevent换行符翻译建议你打开文件

您还需要打开该文件刚刚的一次的,在循环之外:

 开放('result.csv','W')为f:#只需要使用3.x的W模式
    栏=('标题','开发商','developer_link','价格','评价','评审',
              '下载','date_published','OPERATING_SYSTEM','CONTENT_RATING',
              '类别')
    W = csv.DictWriter(F,)
    w.writeheader()    在soup.findAll('格',{'类':'细节'})DIV:
        #
        #打造app_details
        #        w.writerow(app_details)

I'm a python beginner.

I'm trying to crawl google play store and export to csv file. But I got a error message.

UnicodeEncodeError: 'cp949' codec can't encode character '\u20a9' in position 90: illegal multibyte sequence

Here is my source code.

When I command print, it works. But it shows error message when exporting to csv file

please help me

from bs4 import BeautifulSoup
import urllib.request
import urllib.parse
import codecs
import json
import pickle
from datetime import datetime
import sys
import csv
import os


req = 'https://play.google.com/store/search?q=hana&c=apps&num=300'



response = urllib.request.urlopen(req)
the_page = response.read()
soup = BeautifulSoup(the_page)


#app_link  = soup.find('a', {'class' : 'title'})
#app_url = app_link.get('href')





for div in soup.findAll( 'div', {'class' : 'details'} ):
    title = div.find( 'a', {'class':'title'} )
    #print(title.get('href')) 
    app_url = title.get('href')

    app_details={}


    g_app_url = 'https://play.google.com' + app_url

    app_response = urllib.request.urlopen(g_app_url)
    app_page = app_response.read()
    soup = BeautifulSoup(app_page)
    #print(soup)


    #print( g_app_url )
    title_div = soup.find( 'div', {'class':'document-title'} )
    app_details['title'] = title_div.find( 'div' ).get_text().strip()

    subtitle = soup.find( 'a', {'class' : 'document-subtitle primary'} )
    app_details['developer'] = subtitle.get_text().strip()
    app_details['developer_link'] = subtitle.get( 'href' ).strip()

    price_buy_span = soup.find( 'span', {'class' : 'price buy'} )
    price = price_buy_span.find_all( 'span' )[-1].get_text().strip()
    price = price[:-4].strip() if price != 'Install' else 'Free' 
    app_details['price'] = price

    rating_value_meta = soup.find( 'meta', {'itemprop' : 'ratingValue'} )
    app_details['rating'] = rating_value_meta.get( 'content' ).strip()

    reviewers_count_meta = soup.find( 'meta', {'itemprop' : 'ratingCount'} )
    app_details['reviewers'] = reviewers_count_meta.get( 'content' ).strip()

    num_downloads_div = soup.find( 'div', {'itemprop' : 'numDownloads'} )
    if num_downloads_div: app_details['downloads'] = num_downloads_div.get_text().strip()

    date_published_div = soup.find( 'div', {'itemprop' : 'datePublished'} )
    app_details['date_published'] = date_published_div.get_text().strip()

    operating_systems_div = soup.find( 'div', {'itemprop' : 'operatingSystems'} )
    app_details['operating_system'] = operating_systems_div.get_text().strip()

    content_rating_div = soup.find( 'div', {'itemprop' : 'contentRating'} )
    app_details['content_rating'] = content_rating_div.get_text().strip()

    category_span = soup.find( 'span', {'itemprop' : 'genre'} )
    app_details['category'] = category_span.get_text()
    #print(app_details)


    with open('result.csv', 'w') as f:  # Just use 'w' mode in 3.x
        w = csv.DictWriter(f, app_details.keys())
        w.writeheader()
        w.writerow(app_details)

解决方案

Python 3 opens text files in the locale default encoding; if that encoding cannot handle the Unicode values you are trying to write to it, pick a different codec:

with open('result.csv', 'w', encoding='UTF-8', newline='') as f:

That'd encode any unicode strings to UTF-8 instead, an encoding which can handle all of the Unicode standard.

Note that the csv module recommends you open files using newline='' to prevent newline translation.

You also need to open the file just once, outside of the for loop:

with open('result.csv', 'w') as f:  # Just use 'w' mode in 3.x
    fields = ('title', 'developer', 'developer_link', 'price', 'rating', 'reviewers',
              'downloads', 'date_published', 'operating_system', 'content_rating',
              'category')
    w = csv.DictWriter(f, )
    w.writeheader()

    for div in soup.findAll( 'div', {'class' : 'details'} ):
        #
        # build app_details
        #

        w.writerow(app_details)

这篇关于UNI $ C $岑codeError:“CP949”codeC不能连接90位code字符“\\ u20a9':非法多字节序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆