UnicodeEncodeError：'ascii'编解码器无法编码字符'\xe9'--使用urlib.request python3时 [英] UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' - -when using urlib.request python3

查看：186 发布时间：2020/6/10 22:29:14 python exception-handling web-scraping beautifulsoup utf8-decode

本文介绍了UnicodeEncodeError：'ascii'编解码器无法编码字符'\xe9'--使用urlib.request python3时的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在编写一个脚本，该脚本转到链接列表并解析信息。

I'm writing a script that goes to a list of links and parses the information.

它适用于大多数网站，但对于某些使用$ b $的网站却令人窒息b UnicodeEncodeError：'ascii'编解码器无法在位置13编码字符'\xe9'：序数不在范围内（128）

It works for most sites but It's choking on some with "UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 13: ordinal not in range(128)"

它在客户端停止。 py，它是python3上urlib的一部分

It stops on client.py which is part of urlib on python3

确切的链接是：
http://finance.yahoo.com/news /cafés-growing-faster-than-fast-food-peers-144512056.html

the exact link is: http://finance.yahoo.com/news/cafés-growing-faster-than-fast-food-peers-144512056.html

这里有很多类似的帖子，但这些答案似乎都不适合我。

There are quite a few similar postings here but none of the answers seems to work for me.

我的代码是：

from urllib import request

def __request(link,debug=0):      

try:
    html = request.urlopen(link, timeout=35).read() #made this long as I was getting lots of timeouts
    unicode_html = html.decode('utf-8','ignore')

# NOTE the except HTTPError must come first, otherwise except URLError will also catch an HTTPError.
except HTTPError as e:
    if debug:
        print('The server couldn\'t fulfill the request for ' + link)
        print('Error code: ', e.code)
    return ''
except URLError as e:
    if isinstance(e.reason, socket.timeout):
        print('timeout')
        return ''    
else:
    return unicode_html

这将调用请求函数

link =' http://finance.yahoo.com/新闻/cafés-growing-than-fast-food-peers-144512056.html'
页面= __request（链接）

this calls the request function

link = 'http://finance.yahoo.com/news/cafés-growing-faster-than-fast-food-peers-144512056.html' page = __request(link)

追溯是：

Traceback (most recent call last):
  File "<string>", line 250, in run_nodebug
  File "C:\reader\get_news.py", line 276, in <module>
    main()
  File "C:\reader\get_news.py", line 255, in main
    body = get_article_body(item['link'],debug=0)
  File "C:\reader\get_news.py", line 155, in get_article_body
    page = __request('na',url)
  File "C:\reader\get_news.py", line 50, in __request
    html = request.urlopen(link, timeout=35).read()
  File "C:\Python33\Lib\urllib\request.py", line 156, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Python33\Lib\urllib\request.py", line 469, in open
    response = self._open(req, data)
  File "C:\Python33\Lib\urllib\request.py", line 487, in _open
    '_open', req)
  File "C:\Python33\Lib\urllib\request.py", line 447, in _call_chain
    result = func(*args)
  File "C:\Python33\Lib\urllib\request.py", line 1268, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "C:\Python33\Lib\urllib\request.py", line 1248, in do_open
    h.request(req.get_method(), req.selector, req.data, headers)
  File "C:\Python33\Lib\http\client.py", line 1061, in request
    self._send_request(method, url, body, headers)
  File "C:\Python33\Lib\http\client.py", line 1089, in _send_request
    self.putrequest(method, url, **skips)
  File "C:\Python33\Lib\http\client.py", line 953, in putrequest
    self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 13: ordinal not in range(128)

感谢任何帮助，这让我发疯了，我想我已经尝试过x.decode和类似的所有组合

Any help appreciated It's driving me crazy , I think I've tried all combinations of x.decode and similar

（如果可能的话，我可以忽略令人讨厌的字符。）

(I could ignore the offending characters if that is possible.)

推荐答案

使用百分比编码的URL ：

link = 'http://finance.yahoo.com/news/caf%C3%A9s-growing-faster-than-fast-food-peers-144512056.html'

我通过将浏览器指向

I found the above percent-encoded URL by pointing the browser at

http://finance.yahoo.com/news/cafés-growing-faster-than-fast-food-peers-144512056.html

转到页面，然后将浏览器提供的
编码的URL复制并粘贴回文本编辑器。但是，您可以使用以下程序以编程方式生成百分比编码的URL：

going to the page, then copying-and-pasting the encoded url supplied by the browser back into the text editor. However, you can generate a percent-encoded URL programmatically using:

from urllib import parse

link = 'http://finance.yahoo.com/news/cafés-growing-faster-than-fast-food-peers-144512056.html'

scheme, netloc, path, query, fragment = parse.urlsplit(link)
path = parse.quote(path)
link = parse.urlunsplit((scheme, netloc, path, query, fragment))

这会产生

http://finance.yahoo.com/news/caf%C3%A9s-growing-faster-than-fast-food-peers-144512056.html

这篇关于UnicodeEncodeError：'ascii'编解码器无法编码字符'\xe9'--使用urlib.request python3时的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

UnicodeEncodeError：'ascii'编解码器无法编码字符'\xe9'--使用urlib.request python3时 [英] UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' - -when using urlib.request python3

问题描述

这将调用请求函数

this calls the request function

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

UnicodeEncodeError：'ascii'编解码器无法编码字符'\xe9'--使用urlib.request python3时 [英] UnicodeEncodeError: &#39;ascii&#39; codec can&#39;t encode character &#39;\xe9&#39; - -when using urlib.request python3

问题描述

这将调用请求函数

this calls the request function

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

UnicodeEncodeError：'ascii'编解码器无法编码字符'\xe9'--使用urlib.request python3时 [英] UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' - -when using urlib.request python3

登录关闭