收到UnicodeEncodeError的Python脚本:"ascii"编解码器无法编码字符 [英] Python script receiving a UnicodeEncodeError: 'ascii' codec can't encode character

查看:112
本文介绍了收到UnicodeEncodeError的Python脚本:"ascii"编解码器无法编码字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个简单的Python脚本,该脚本从reddit中提取帖子并将其发布到Twitter.不幸的是,今晚它开始出现一些问题,我认为这是由于reddit上某人的标题存在格式问题.我收到的错误是:

I have a simple Python script that pulls posts from reddit and posts them on Twitter. Unfortunately, tonight it began having issues that I'm assuming are because of someone's title on reddit having a formatting issue. The error that I'm reciving is:

  File "redditbot.py", line 82, in <module>
  main()
 File "redditbot.py", line 64, in main
 tweeter(post_dict, post_ids)
 File "redditbot.py", line 74, in tweeter
 print post+" "+post_dict[post]+" #python"
 UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in  position 34: ordinal not in range(128)

这是我的脚本:

# encoding=utf8
import praw
import json
import requests
import tweepy
import time
import urllib2
import sys
reload(sys)
sys.setdefaultencoding('utf8')

access_token = 'hidden'
access_token_secret = 'hidden'
consumer_key = 'hidden'
consumer_secret = 'hidden'


def strip_title(title):
    if len(title) < 75:
    return title
else:
    return title[:74] + "..."

def tweet_creator(subreddit_info):
post_dict = {}
post_ids = []
print "[bot] Getting posts from Reddit"
for submission in subreddit_info.get_hot(limit=2000):
    post_dict[strip_title(submission.title)] = submission.url
    post_ids.append(submission.id)
print "[bot] Generating short link using goo.gl"
mini_post_dict = {}
for post in post_dict:
    post_title = post
    post_link = post_dict[post]

    mini_post_dict[post_title] = post_link
return mini_post_dict, post_ids

def setup_connection_reddit(subreddit):
print "[bot] setting up connection with Reddit"
r = praw.Reddit('PythonReddit PyReTw'
            'monitoring %s' %(subreddit))
subreddit = r.get_subreddit('python')
return subreddit



def duplicate_check(id):
found = 0
with open('posted_posts.txt', 'r') as file:
    for line in file:
        if id in line:
            found = 1
return found

def add_id_to_file(id):
with open('posted_posts.txt', 'a') as file:
    file.write(str(id) + "\n")

def main():
subreddit = setup_connection_reddit('python')
post_dict, post_ids = tweet_creator(subreddit)
tweeter(post_dict, post_ids)

def tweeter(post_dict, post_ids):
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
for post, post_id in zip(post_dict, post_ids):
    found = duplicate_check(post_id)
    if found == 0:
        print "[bot] Posting this link on twitter"
        print post+" "+post_dict[post]+" #python"
        api.update_status(post+" "+post_dict[post]+" #python")
        add_id_to_file(post_id)
        time.sleep(3000)
    else:
        print "[bot] Already posted"

if __name__ == '__main__':
main()

任何帮助将不胜感激-预先感谢!

Any help would be very much appreciated - thanks in advance!

推荐答案

考虑以下简单程序:

print(u'\u201c' + "python")

如果尝试打印到终端(使用适当的字符编码),则会得到

If you try printing to a terminal (with an appropriate character encoding), you get

"python

但是,如果尝试将输出重定向到文件,则会得到UnicodeEncodeError.

However, if you try redirecting output to a file, you get a UnicodeEncodeError.

script.py > /tmp/out
Traceback (most recent call last):
  File "/home/unutbu/pybin/script.py", line 4, in <module>
    print(u'\u201c' + "python")
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 0: ordinal not in range(128)

当您打印到终端时,Python使用终端的字符编码来编码unicode. (终端只能打印字节,因此必须对unicode进行编码才能打印.)

When you print to a terminal, Python uses the terminal's character encoding to encode unicode. (Terminals can only print bytes, so unicode must be encoded in order to be printed.)

将输出重定向到文件时,Python无法确定字符编码,因为文件没有声明的编码.因此,默认情况下,Python2在写入文件之前使用ascii编码隐式编码所有unicode.由于无法对u'\u201c'进行ascii编码,因此请使用UnicodeEncodeError. (只有前127个unicode代码点可以使用ascii进行编码.)

When you redirect output to a file, Python can not determine the character encoding since files have no declared encoding. So by default Python2 implicitly encodes all unicode using the ascii encoding before writing to the file. Since u'\u201c' can not be ascii encoded, a UnicodeEncodeError. (Only the first 127 unicode code points can be encoded with ascii).

为什么打印失败Wiki 中对此问题进行了详细说明.

This issue is explained in detail in the Why Print Fails wiki.

要解决此问题,首先,避免添加unicode和字节字符串.这将导致在Python2中使用ascii编解码器进行隐式转换,而在Python3中导致异常.为了使您的代码更适合将来,最好是明确的.例如,在格式化和打印字节之前,先对post进行显式编码:

To fix the problem, first, avoid adding unicode and byte strings. This causes implicit conversion using the ascii codec in Python2, and an exception in Python3. To future-proof your code, it is better to be explicit. For example, encode post explicitly before formatting and printing the bytes:

post = post.encode('utf-8')
print('{} {} #python'.format(post, post_dict[post]))

这篇关于收到UnicodeEncodeError的Python脚本:"ascii"编解码器无法编码字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆