URL解码请求 [英] URLDecoding requests

查看:223
本文介绍了URL解码请求的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从requests获取原始URL.这是我到目前为止的内容:

I am trying to get the original url from requests. Here is what I have so far:

res = requests.get(...)
url = urllib.unquote(res.url).decode('utf8') 

然后我看到一条错误消息:

I then get an error that says:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 60-61: ordinal not in range(128)

我请求的原始网址是:

https://www.microsoft.com/de-at/store/movies/american-pie-pr\xc3\xa4sentiert-nackte-tatsachen/8d6kgwzl63ql

这是我尝试打印时发生的事情:

And here is what happens when I try printing:

>>> print '111', res.url
111 https://www.microsoft.com/de-at/store/movies/american-pie-pr%C3%A4sentiert-nackte-tatsachen/8d6kgwzl63ql
>>> print '222', urllib.unquote( res.url )
222 https://www.microsoft.com/de-at/store/movies/american-pie-präsentiert-nackte-tatsachen/8d6kgwzl63ql
>>> print '333', urllib.unquote(res.url).decode('utf8') 
UnicodeEncodeError: 'ascii' codec can't encode characters in position 60-61: ordinal not in range(128)

为什么会这样,我该如何解决?

Why is this occurring, and how would I fix this?

推荐答案

UnicodeEncodeError: 'ascii' codec can't encode characters

您正在尝试解码已为Unicode的字符串.它在Python 3上引发AttributeError(unicode字符串那里没有.decode()方法). Python 2尝试先使用sys.getdefaultencoding()('ascii')将字符串编码为字节,然后再将其传递给.decode('utf8'),从而导致UnicodeEncodeError.

You are trying to decode a string that is Unicode already. It raises AttributeError on Python 3 (unicode string has no .decode() method there). Python 2 tries to encode the string into bytes first using sys.getdefaultencoding() ('ascii') before passing it to .decode('utf8') which leads to UnicodeEncodeError.

简而言之,请勿在Unicode字符串上调用.decode() ,而应使用以下代码:

In short, do not call .decode() on Unicode strings, use this instead:

print urllib.unquote(res.url.encode('ascii')).decode('utf-8')


没有调用.decode()的情况下,代码将打印字节(假定将字节串传递给unquote()),如果您的环境使用的字符编码不是utf-8,则可能导致mojibake.为避免mojibake,始终打印Unicode (不要将 text 打印为字节),请不要在脚本中对环境字符编码进行硬编码,即必须使用.decode()在这里.


Without .decode() call, the code prints bytes (assuming a bytestring is passed to unquote()) that may lead to mojibake if the character encoding used by your environment is not utf-8. To avoid mojibake, always print Unicode (don't print text as bytes), do not hardcode the character encoding of your environment inside your script i.e., .decode() is necessary here.

urllib.unquote()中,如果您向其传递Unicode字符串,则存在错误:

There is a bug in urllib.unquote() if you pass it a Unicode string:

>>> print urllib.unquote(u'​%C3%A4')
ä
>>> print urllib.unquote('​%C3%A4') # utf-8 output
ä

在Python 2上将字节字符串传递给unquote().

Pass bytestrings to unquote() on Python 2.

这篇关于URL解码请求的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆