URL解码请求 [英] URLDecoding requests

查看：223 发布时间：2020/7/24 22:03:35 python unicode python-requests urlencode

本文介绍了URL解码请求的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试从requests获取原始URL.这是我到目前为止的内容:

I am trying to get the original url from requests. Here is what I have so far:

res = requests.get(...)
url = urllib.unquote(res.url).decode('utf8')

然后我看到一条错误消息:

I then get an error that says:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 60-61: ordinal not in range(128)

我请求的原始网址是:

https://www.microsoft.com/de-at/store/movies/american-pie-pr\xc3\xa4sentiert-nackte-tatsachen/8d6kgwzl63ql

这是我尝试打印时发生的事情:

And here is what happens when I try printing:

>>> print '111', res.url
111 https://www.microsoft.com/de-at/store/movies/american-pie-pr%C3%A4sentiert-nackte-tatsachen/8d6kgwzl63ql
>>> print '222', urllib.unquote( res.url )
222 https://www.microsoft.com/de-at/store/movies/american-pie-prÃ¤sentiert-nackte-tatsachen/8d6kgwzl63ql
>>> print '333', urllib.unquote(res.url).decode('utf8') 
UnicodeEncodeError: 'ascii' codec can't encode characters in position 60-61: ordinal not in range(128)

为什么会这样，我该如何解决?

Why is this occurring, and how would I fix this?

推荐答案

UnicodeEncodeError: 'ascii' codec can't encode characters

您正在尝试解码已为Unicode的字符串.它在Python 3上引发AttributeError(unicode字符串那里没有.decode()方法). Python 2尝试先使用sys.getdefaultencoding()('ascii')将字符串编码为字节，然后再将其传递给.decode('utf8')，从而导致UnicodeEncodeError.

You are trying to decode a string that is Unicode already. It raises AttributeError on Python 3 (unicode string has no .decode() method there). Python 2 tries to encode the string into bytes first using sys.getdefaultencoding() ('ascii') before passing it to .decode('utf8') which leads to UnicodeEncodeError.

简而言之，请勿在Unicode字符串上调用.decode() ，而应使用以下代码:

In short, do not call .decode() on Unicode strings, use this instead:

print urllib.unquote(res.url.encode('ascii')).decode('utf-8')

没有调用.decode()的情况下，代码将打印字节(假定将字节串传递给unquote())，如果您的环境使用的字符编码不是utf-8，则可能导致mojibake.为避免mojibake，始终打印Unicode (不要将 text 打印为字节)，请不要在脚本中对环境字符编码进行硬编码，即必须使用.decode()在这里.

Without .decode() call, the code prints bytes (assuming a bytestring is passed to unquote()) that may lead to mojibake if the character encoding used by your environment is not utf-8. To avoid mojibake, always print Unicode (don't print text as bytes), do not hardcode the character encoding of your environment inside your script i.e., .decode() is necessary here.

在urllib.unquote()中，如果您向其传递Unicode字符串，则存在错误:

There is a bug in urllib.unquote() if you pass it a Unicode string:

>>> print urllib.unquote(u'%C3%A4')
Ã¤
>>> print urllib.unquote('%C3%A4') # utf-8 output
ä

在Python 2上将字节字符串传递给unquote().

Pass bytestrings to unquote() on Python 2.

这篇关于URL解码请求的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

URL解码请求 [英] URLDecoding requests

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

URL解码请求 [英] URLDecoding requests

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭