获取URL时的UnicodeEncodeError [英] UnicodeEncodeError when fetching URLs

查看:90
本文介绍了获取URL时的UnicodeEncodeError的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用urlfetch来获取网址。当我尝试将其发送到html2text函数(去掉所有HTML标记)时,我收到以下消息:

  UnicodeEncodeError:' charmap'编解码器无法对位置中的字符进行编码...字符映射到<未定义> 

我一直在试图处理encode('UTF-8','ignore')字符串,但我不断收到此错误。



有什么想法?

谢谢,



乔尔






部分代码:

  result = urlfetch.fetch(url =http://www.google.com)
html2text(result.content.encode('utf-8', ')忽略))

错误信息:

 编码
的文件C:\Python26\lib\encodings\cp1252.py,第12行返回codecs.charmap_encode(input,errors ,encoding_table)
UnicodeEncodeError:'charmap'编解码器无法对位置159-165中的字符进行编码:字符映射到< undefined>


解决方案

您需要解码您首先获取的数据!用哪个编解码器?取决于你获取的网站。



当你有unicode并尝试用 some_unicode.encode('utf-8','ignore ')我无法想象它是如何抛出错误的。



确定你需要做什么:

  result = fetch('http://google.com')
content_type = result.headers ['Content-Type']#figure out你只是取得
ctype,charset = content_type.split(';')
encoding = charset [len('charset ='):]#获得编码
打印编码#ie ISO -8859-1
utext = result.content.decode(encoding)#现在你有unicode
text = utext.encode('utf8','ignore')#encode to uft8

这不是非常健壮的,但它应该显示出来。


I am using urlfetch to fetch a URL. When I try to send it to html2text function (strips off all HTML tags), I get the following message:

UnicodeEncodeError: 'charmap' codec can't encode characters in position  ... character maps to <undefined>

I've been trying to process encode('UTF-8','ignore') on the string but I keep getting this error.

Any ideas?

Thanks,

Joel


Some Code:

result = urlfetch.fetch(url="http://www.google.com")
html2text(result.content.encode('utf-8', 'ignore'))

And the error message:

File "C:\Python26\lib\encodings\cp1252.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 159-165: character maps to <undefined>

解决方案

You need to decode the data you fetched first! With which codec? Depends on the website you fetch.

When you have unicode and try to encode it with some_unicode.encode('utf-8', 'ignore') i can't image how it could throw an error.

Ok what you need to do:

result = fetch('http://google.com') 
content_type = result.headers['Content-Type'] # figure out what you just fetched
ctype, charset = content_type.split(';')
encoding = charset[len(' charset='):] # get the encoding
print encoding # ie ISO-8859-1
utext = result.content.decode(encoding) # now you have unicode
text = utext.encode('utf8', 'ignore') # encode to uft8

This is not really robust but it should show you the way.

这篇关于获取URL时的UnicodeEncodeError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆