如何用python来安全地编码一个字符串?和urllib.quote是错误的 [英] how to url-safe encode a string with python? and urllib.quote is wrong
问题描述
你好,我想知道你是否知道任何其他方法来编码字符串到一个url-safe,因为urllib.quote是错误的,输出是不同于预期的:
Hello i was wondering if you know any other way to encode a string to a url-safe, because urllib.quote is doing it wrong, the output is different than expected:
如果我尝试
urllib.quote('á' )
urllib.quote('á')
我得到
'%C3%A1'
但是这不是正确的输出,它应该是
%E1
But thats not the correct output, it should be %E1
由本网站
这不是我很难,引用的错误输出阻止浏览器找到资源,如果我尝试
And this is not me being difficult, the incorrect output of quote is preventing the browser to found resources, if i try
urllib.quote('\images\á\somefile.jpg')
urllib.quote('\images\á\some file.jpg')
然后我尝试使用javascript工具,我提到我分别得到这个字符串
And then i try with the javascript tool i mentioned i get this strings respectively
%5Cimages%5C%C3%A1% 5Csome%20file.jpg
%5Cimages%5C%C3%A1%5Csome%20file.jpg
%5Cimages%5C%E1%5Csome%20file.jpg
%5Cimages%5C%E1%5Csome%20file.jpg
注意如何几乎相同,但由引用提供的url不起作用,另一个则是这样。
我尝试在提供给引用的字符串上编码('utf-8),但是没有什么区别。
我用其他西班牙语的单词用口音尝试,ñ他们都有不同的表示。
Note how is almost the same but the url provided by quote doesn't work and the other one it does. I tried messing with encode('utf-8) on the string provided to quote but it does not make a difference. I tried with other spanish words with accents and the ñ they all are differently represented.
这是一个python bug吗?
你知道某个模块是否正确?
Is this a python bug? Do you know some module that get this right?
推荐答案
根据 RFC 3986 ,%C3%A1
是正确的。在字节流被百分比编码之前,字符应该使用UTF-8转换为八位字节流。您链接的网站已过期。
According to RFC 3986, %C3%A1
is correct. Characters are supposed to be converted to an octet stream using UTF-8 before the octet stream is percent-encoded. The site you link is out of date.
请参阅为什么URL的编码和查询字符串部分不同?有关处理非历史记录的更多详细信息-ASCII字符。
See Why does the encoding's of a URL and the query string part differ? for more detail on the history of handling non-ASCII characters in URLs.
这篇关于如何用python来安全地编码一个字符串?和urllib.quote是错误的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!