获取URL时的UnicodeEncodeError [英] UnicodeEncodeError when fetching URLs

查看：90 发布时间：2018/5/3 19:40:50 python google-app-engine

本文介绍了获取URL时的UnicodeEncodeError的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用urlfetch来获取网址。当我尝试将其发送到html2text函数（去掉所有HTML标记）时，我收到以下消息：

  UnicodeEncodeError：' charmap'编解码器无法对位置中的字符进行编码...字符映射到<未定义>

我一直在试图处理encode（'UTF-8'，'ignore'）字符串，但我不断收到此错误。

有什么想法？

谢谢，

乔尔

部分代码：
result = urlfetch.fetch（url =http://www.google.com） html2text（result.content.encode（'utf-8'， '）忽略））
错误信息：

编码的文件C：\Python26\lib\encodings\cp1252.py，第12行返回codecs.charmap_encode（input，errors ，encoding_table） UnicodeEncodeError：'charmap'编解码器无法对位置159-165中的字符进行编码：字符映射到< undefined>

解决方案
您需要解码您首先获取的数据！用哪个编解码器？取决于你获取的网站。

当你有unicode并尝试用 some_unicode.encode（'utf-8'，'ignore '）我无法想象它是如何抛出错误的。

确定你需要做什么：
result = fetch（'http://google.com'） content_type = result.headers ['Content-Type']＃figure out你只是取得 ctype，charset = content_type.split（';'） encoding = charset [len（'charset ='）：]＃获得编码打印编码＃ie ISO -8859-1 utext = result.content.decode（encoding）＃现在你有unicode text = utext.encode（'utf8'，'ignore'）＃encode to uft8
这不是非常健壮的，但它应该显示出来。

I am using urlfetch to fetch a URL. When I try to send it to html2text function (strips off all HTML tags), I get the following message:
UnicodeEncodeError: 'charmap' codec can't encode characters in position ... character maps to <undefined>
I've been trying to process encode('UTF-8','ignore') on the string but I keep getting this error.

Any ideas?

Thanks,

Joel

Some Code:
result = urlfetch.fetch(url="http://www.google.com") html2text(result.content.encode('utf-8', 'ignore'))
And the error message:
File "C:\Python26\lib\encodings\cp1252.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_table) UnicodeEncodeError: 'charmap' codec can't encode characters in position 159-165: character maps to <undefined>

解决方案
You need to decode the data you fetched first! With which codec? Depends on the website you fetch.

When you have unicode and try to encode it with some_unicode.encode('utf-8', 'ignore') i can't image how it could throw an error.

Ok what you need to do:
result = fetch('http://google.com') content_type = result.headers['Content-Type'] # figure out what you just fetched ctype, charset = content_type.split(';') encoding = charset[len(' charset='):] # get the encoding print encoding # ie ISO-8859-1 utext = result.content.decode(encoding) # now you have unicode text = utext.encode('utf8', 'ignore') # encode to uft8
This is not really robust but it should show you the way.

这篇关于获取URL时的UnicodeEncodeError的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

获取URL时的UnicodeEncodeError [英] UnicodeEncodeError when fetching URLs

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

获取URL时的UnicodeEncodeError [英] UnicodeEncodeError when fetching URLs

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭