使用Python进行URL编码/解码 [英] URL encoding/decoding with Python

查看:118
本文介绍了使用Python进行URL编码/解码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在Python中编码,存储和解码参数,并且一路迷路.这是我的步骤:

I am trying to encode and store, and decode arguments in Python and getting lost somewhere along the way. Here are my steps:

1)我使用Google工具箱的gtm_stringByEscapingForURLArgument正确转换了NSString,以便传递给HTTP参数.

1) I use google toolkit's gtm_stringByEscapingForURLArgument to convert an NSString properly for passing into HTTP arguments.

2)在我的服务器(python)上,我将这些字符串参数存储为u'1234567890-/:;()$&@".,?!\'[]{}#%^*+=_\\|~<>\u20ac\xa3\xa5\u2022.,?!\''之类的东西(请注意,这些是iphone键盘上"123"视图和#+ =视图中的标准键, \u\x字符中有一些货币前缀,例如英镑,日元等)

2) On my server (python), I store these string arguments as something like u'1234567890-/:;()$&@".,?!\'[]{}#%^*+=_\\|~<>\u20ac\xa3\xa5\u2022.,?!\'' (note that these are the standard keys on an iphone keypad in the "123" view and the "#+=" view, the \u and \x chars in there being some monetary prefixes like pound, yen, etc)

3)我调用该存储值上的urllib.quote(myString,''),大概是对其进行%-escape转义,以便将其传输到客户端,以便客户端可以百分之百地对它们进行转义.

3) I call urllib.quote(myString,'') on that stored value, presumably to %-escape them for transport to the client so the client can unpercent escape them.

结果是当我尝试记录%转义的结果时出现异常.我是否忽略了一些关键步骤,需要将它们应用于具有\ u和\ x格式的存储值,以便正确地将其转换为通过http发送?

The result is that I am getting an exception when I try to log the result of % escaping. Is there some crucial step I am overlooking that needs to be applied to the stored value with the \u and \x format in order to properly convert it for sending over http?

更新:以下标记为答案的建议对我有用.不过,我正在提供一些更新以解决下面的评论.

Update: The suggestion marked as the answer below worked for me. I am providing some updates to address the comments below to be complete, though.

我收到的异常引用了\u20ac的问题.我不知道这是否是一个具体的问题,而不是它是字符串中的第一个unicode字符的事实.

The exception I received cited an issue with \u20ac. I don't know if it was a problem with that specifically, rather than the fact that it was the first unicode character in the string.

\u20ac char是'euro'符号的unicode.基本上,除非使用urllib2 quote方法,否则我会遇到问题.

That \u20ac char is the unicode for the 'euro' symbol. I basically found I'd have issues with it unless I used the urllib2 quote method.

推荐答案

url编码原始" unicode确实没有任何意义.您需要做的是先.encode("utf8"),这样您才有了一个已知的字节编码,然后再.quote().

url encoding a "raw" unicode doesn't really make sense. What you need to do is .encode("utf8") first so you have a known byte encoding and then .quote() that.

输出不是很漂亮,但是应该是正确的uri编码.

The output isn't very pretty but it should be a correct uri encoding.

>>> s = u'1234567890-/:;()$&@".,?!\'[]{}#%^*+=_\|~<>\u20ac\xa3\xa5\u2022.,?!\''
>>> urllib2.quote(s.encode("utf8"))
'1234567890-/%3A%3B%28%29%24%26%40%22.%2C%3F%21%27%5B%5D%7B%7D%23%25%5E%2A%2B%3D_%5C%7C%7E%3C%3E%E2%82%AC%C2%A3%C2%A5%E2%80%A2.%2C%3F%21%27'

请记住,如果要调试或进行其他操作,则需要同时使用unquote()decode()才能正确打印出来.

Remember that you will need to both unquote() and decode() this to print it out properly if you're debugging or whatever.

>>> print urllib2.unquote(urllib2.quote(s.encode("utf8")))
1234567890-/:;()$&@".,?!'[]{}#%^*+=_\|~<>€£¥•.,?!'
>>> # oops, nasty  means we've got a utf8 byte stream being treated as an ascii stream
>>> print urllib2.unquote(urllib2.quote(s.encode("utf8"))).decode("utf8")
1234567890-/:;()$&@".,?!'[]{}#%^*+=_\|~<>€£¥•.,?!'

实际上,这就是 django另一个答案中提到的功能.

功能 django.utils.http.urlquote()和 django.utils.http.urlquote_plus()是 版本的Python标准 urllib.quote()和urllib.quote_plus() 适用于非ASCII字符. (数据先转换为UTF-8 编码.)

The functions django.utils.http.urlquote() and django.utils.http.urlquote_plus() are versions of Python’s standard urllib.quote() and urllib.quote_plus() that work with non-ASCII characters. (The data is converted to UTF-8 prior to encoding.)

如果要使用其他引号或编码以免弄乱事物,请务必小心.

Be careful if you are applying any further quotes or encodings not to mangle things.

这篇关于使用Python进行URL编码/解码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆