“UnicodeEncodeError: 'ascii' 编解码器无法编码字符"; [英] "UnicodeEncodeError: 'ascii' codec can't encode character"

查看:69
本文介绍了“UnicodeEncodeError: 'ascii' 编解码器无法编码字符";的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试通过正则表达式传递大串随机 html 并且我的 Python 2.6 脚本对此感到窒息:

I'm trying to pass big strings of random html through regular expressions and my Python 2.6 script is choking on this:

UnicodeEncodeError: 'ascii' 编解码器无法编码字符

UnicodeEncodeError: 'ascii' codec can't encode character

我将其追溯到这个词末尾的商标上标:Protection™ -- 我希望将来还会遇到类似的其他人.

I traced it back to a trademark superscript on the end of this word: Protection™ -- and I expect to encounter others like it in the future.

有处理非ascii字符的模块吗?或者,在 python 中处理/转义非 ascii 内容的最佳方法是什么?

Is there a module to process non-ascii characters? or, what is the best way to handle/escape non-ascii stuff in python?

谢谢!完全错误:

E
======================================================================
ERROR: test_untitled (__main__.Untitled)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Python26\Test2.py", line 26, in test_untitled
    ofile.write(Whois + '\n')
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2122' in position 1005: ordinal not in range(128)

完整脚本:

from selenium import selenium
import unittest, time, re, csv, logging

class Untitled(unittest.TestCase):
    def setUp(self):
        self.verificationErrors = []
        self.selenium = selenium("localhost", 4444, "*firefox", "http://www.BaseDomain.com/")
        self.selenium.start()
        self.selenium.set_timeout("90000")

    def test_untitled(self):
        sel = self.selenium
        spamReader = csv.reader(open('SubDomainList.csv', 'rb'))
        for row in spamReader:
            sel.open(row[0])
            time.sleep(10)
            Test = sel.get_text("//html/body/div/table/tbody/tr/td/form/div/table/tbody/tr[7]/td")
            Test = Test.replace(",","")
            Test = Test.replace("\n", "")
            ofile = open('TestOut.csv', 'ab')
            ofile.write(Test + '\n')
            ofile.close()

    def tearDown(self):
        self.selenium.stop()
        self.assertEqual([], self.verificationErrors)

if __name__ == "__main__":
    unittest.main()

推荐答案

您正在尝试将字节串传递给某物,但无法(由于您提供的信息稀缺)告诉什么你正试图将它传递给.您从一个无法编码为 ASCII(默认编解码器)的 Unicode 字符串开始,因此,您必须通过一些不同的编解码器进行编码(或按照@R.Pate 的建议将其音译)——但它不可能用于说你应该使用什么编解码器,因为我们不知道你传递的是什么字节串,因此不知道未知子系统将能够接受和正确处理什么编解码器.

You're trying to pass a bytestring to something, but it's impossible (from the scarcity of info you provide) to tell what you're trying to pass it to. You start with a Unicode string that cannot be encoded as ASCII (the default codec), so, you'll have to encode by some different codec (or transliterate it, as @R.Pate suggests) -- but it's impossible for use to say what codec you should use, because we don't know what you're passing the bytestring and therefore don't know what that unknown subsystem is going to be able to accept and process correctly in terms of codecs.

在你让我们陷入完全黑暗的情况下,utf-8 是一个合理的盲目猜测(因为它是一个编解码器,可以将任何 Unicode 字符串完全表示为字节字符串,并且它是用于许多用途,例如 XML)——但它只能是一个盲目的猜测,直到并且除非您要告诉我们更多关于什么您正在尝试传递该字节串为了什么目的.

In such total darkness as you leave us in, utf-8 is a reasonable blind guess (since it's a codec that can represent any Unicode string exactly as a bytestring, and it's the standard codec for many purposes, such as XML) -- but it can't be any more than a blind guess, until and unless you're going to tell us more about what you're trying to pass that bytestring to, and for what purposes.

传递 thestring.encode('utf-8') 而不是裸 thestring 肯定会避免您现在看到的特定错误,但它可能会导致特殊的显示(或者你试图用那个字节串做的任何!)除非接收者准备好了,愿意并且能够接受 utf-8 编码(我们怎么知道,绝对零关于收件人可能是什么的想法?!-)

Passing thestring.encode('utf-8') rather than bare thestring will definitely avoid the particular error you're seeing right now, but it may result in peculiar displays (or whatever it is you're trying to do with that bytestring!) unless the recipient is ready, willing and able to accept utf-8 encoding (and how could WE know, having absolutely zero idea about what the recipient could possibly be?!-)

这篇关于“UnicodeEncodeError: 'ascii' 编解码器无法编码字符";的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆