URL解码请求 [英] URLDecoding requests
问题描述
我正在尝试从requests
获取原始URL.这是我到目前为止的内容:
I am trying to get the original url from requests
. Here is what I have so far:
res = requests.get(...)
url = urllib.unquote(res.url).decode('utf8')
然后我看到一条错误消息:
I then get an error that says:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 60-61: ordinal not in range(128)
我请求的原始网址是:
https://www.microsoft.com/de-at/store/movies/american-pie-pr\xc3\xa4sentiert-nackte-tatsachen/8d6kgwzl63ql
这是我尝试打印时发生的事情:
And here is what happens when I try printing:
>>> print '111', res.url
111 https://www.microsoft.com/de-at/store/movies/american-pie-pr%C3%A4sentiert-nackte-tatsachen/8d6kgwzl63ql
>>> print '222', urllib.unquote( res.url )
222 https://www.microsoft.com/de-at/store/movies/american-pie-präsentiert-nackte-tatsachen/8d6kgwzl63ql
>>> print '333', urllib.unquote(res.url).decode('utf8')
UnicodeEncodeError: 'ascii' codec can't encode characters in position 60-61: ordinal not in range(128)
为什么会这样,我该如何解决?
Why is this occurring, and how would I fix this?
推荐答案
UnicodeEncodeError: 'ascii' codec can't encode characters
您正在尝试解码已为Unicode的字符串.它在Python 3上引发AttributeError
(unicode字符串那里没有.decode()
方法). Python 2尝试先使用sys.getdefaultencoding()
('ascii'
)将字符串编码为字节,然后再将其传递给.decode('utf8')
,从而导致UnicodeEncodeError
.
You are trying to decode a string that is Unicode already. It raises AttributeError
on Python 3 (unicode string has no .decode()
method there). Python 2 tries to encode the string into bytes first using sys.getdefaultencoding()
('ascii'
) before passing it to .decode('utf8')
which leads to UnicodeEncodeError
.
简而言之,请勿在Unicode字符串上调用.decode()
,而应使用以下代码:
In short, do not call .decode()
on Unicode strings, use this instead:
print urllib.unquote(res.url.encode('ascii')).decode('utf-8')
没有调用.decode()
的情况下,代码将打印字节(假定将字节串传递给unquote()
),如果您的环境使用的字符编码不是utf-8,则可能导致mojibake.为避免mojibake,始终打印Unicode (不要将 text 打印为字节),请不要在脚本中对环境字符编码进行硬编码,即必须使用.decode()
在这里.
Without .decode()
call, the code prints bytes (assuming a bytestring is passed to unquote()
) that may lead to mojibake if the character encoding used by your environment is not utf-8. To avoid mojibake, always print Unicode (don't print text as bytes), do not hardcode the character encoding of your environment inside your script i.e., .decode()
is necessary here.
在urllib.unquote()
中,如果您向其传递Unicode字符串,则存在错误:
There is a bug in urllib.unquote()
if you pass it a Unicode string:
>>> print urllib.unquote(u'%C3%A4')
ä
>>> print urllib.unquote('%C3%A4') # utf-8 output
ä
在Python 2上将字节字符串传递给unquote()
.
Pass bytestrings to unquote()
on Python 2.
这篇关于URL解码请求的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!