网址和&符号 [英] URLs and ampersands
问题描述
我正在使用urllib.urlretrieve()下载HTML页面,我已经点击了包含&符号的网址:
http://www.example.com/parrot.php?x = 1& y = 2
在这个过程的某个地方,上面的网址被转移到:
http://www.example.com/parrot.php?x=1& ; amp; y = 2
自然不存在。
我可以做一个字符串替换,但是有一个正确的逃避的方式
和unescape网址?我已经查看了标准库,但是我找不到
任何有用的东西。
-
史蒂文>
I''m using urllib.urlretrieve() to download HTML pages, and I''ve hit a
snag with URLs containing ampersands:
http://www.example.com/parrot.php?x=1&y=2
Somewhere in the process, urls like the above are escaped to:
http://www.example.com/parrot.php?x=1&y=2
which naturally fails to exist.
I could just do a string replace, but is there a "right" way to escape
and unescape URLs? I''ve looked through the standard lib, but I can''t find
anything helpful.
--
Steven
推荐答案
En Mon,2008年8月4日20:43:45 -0300,Steven D''Aprano
< st ** *@REMOVE-THIS-cybersource.com.auescribi ???:
En Mon, 04 Aug 2008 20:43:45 -0300, Steven D''Aprano
<st***@REMOVE-THIS-cybersource.com.auescribi???:
我正在使用urllib.urlretrieve()下载HTML页面,而我''用包含&符号的网址打了一个
的障碍:
http://www.example.com/parrot.php?x=1&y=2
在这个过程的某个地方,像上面这样的网址被转移到:
http://www.example.com/parrot.php?x=1&y=2
这自然不会存在。
我可以做一个字符串替换,但是有一个正确的吗?逃避的方式
和unescape网址?我已经查看了标准库,但我找不到
任何有用的东西。
I''m using urllib.urlretrieve() to download HTML pages, and I''ve hit a
snag with URLs containing ampersands:
http://www.example.com/parrot.php?x=1&y=2
Somewhere in the process, urls like the above are escaped to:
http://www.example.com/parrot.php?x=1&y=2
which naturally fails to exist.
I could just do a string replace, but is there a "right" way to escape
and unescape URLs? I''ve looked through the standard lib, but I can''t find
anything helpful.
这对我来说很好用:
pyimport urllib
pyfn =
urllib.urlretrieve(" http://c7.amazingcounters.com/counter.php?i = 1516903
& c = 4551022")[0]
pyopen(fn," rb")。read()
''\ x89PNG \\\\\\\\\\\\\\\\\\ \ rIHDR \x00 \ x00 ...
所以它不是urlretrieve转义网址,而是你的
代码中的其他内容。 ..
-
Gabriel Genellina
This works fine for me:
pyimport urllib
pyfn =
urllib.urlretrieve("http://c7.amazingcounters.com/counter.php?i=1516903
&c=4551022")[0]
pyopen(fn,"rb").read()
''\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00...
So it''s not urlretrieve escaping the url, but something else in your
code...
--
Gabriel Genellina
8月4日星期一2008 23:16:46 -0300,Gabriel Genellina写道:
On Mon, 04 Aug 2008 23:16:46 -0300, Gabriel Genellina wrote:
En Mon,2008年8月4日20:43:45 -0300,Steven D''Aprano
< st *** @ REMOVE-THIS-cybersource.com.auescribi ???:
En Mon, 04 Aug 2008 20:43:45 -0300, Steven D''Aprano
<st***@REMOVE-THIS-cybersource.com.auescribi???:
>我正在使用urllib .urlretrieve()下载HTML页面,我发现了包含&符号的网址:
http:/ /www.example.com/parrot.php?x=1&y=2
在此过程中,上述网址被转义为:
> http://www.example.com /parrot.php?x=1&y=2
自然不存在。
我可以做一个字符串替换,但是有一个正确的逃避
和unescape网址的方式?我查看了标准库,但我找不到任何有用的东西。
>I''m using urllib.urlretrieve() to download HTML pages, and I''ve hit a
snag with URLs containing ampersands:
http://www.example.com/parrot.php?x=1&y=2
Somewhere in the process, urls like the above are escaped to:
http://www.example.com/parrot.php?x=1&y=2
which naturally fails to exist.
I could just do a string replace, but is there a "right" way to escape
and unescape URLs? I''ve looked through the standard lib, but I can''t
find anything helpful.
这对我来说很好用:
pyimport urllib
pyfn =
urllib.urlretrieve(" http://c7.amazingcounters.com/counter.php?i = 1516903
& c = 4551022")[0]
pyopen(fn," rb")。read()
''\ x89PNG \\\\\\\\\\\\\\\\\\ \ rIHDR \x00 \ x00 ...
所以它不是urlretrieve转义网址,而是你的
代码中的其他内容。 ..
This works fine for me:
pyimport urllib
pyfn =
urllib.urlretrieve("http://c7.amazingcounters.com/counter.php?i=1516903
&c=4551022")[0]
pyopen(fn,"rb").read()
''\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00...
So it''s not urlretrieve escaping the url, but something else in your
code...
我没有说urlretrieve正在逃避URL。我实际上认为
网址是在我从HTML文件中删除时预先转义的。我已经搜索了
,但无法找到逃脱的标准库函数或
unescapes URL。有没有这样的功能?
-
史蒂文
I didn''t say it urlretrieve was escaping the URL. I actually think the
URLs are pre-escaped when I scrape them from a HTML file. I have searched
for, but been unable to find, standard library functions that escapes or
unescapes URLs. Are there any such functions?
--
Steven
Dnia 05 Aug 2008 09: 59:20 GMT,Steven D''Aprano napisa3(a):
Dnia 05 Aug 2008 09:59:20 GMT, Steven D''Aprano napisa3(a):
我没有说urlretrieve正在逃避URL。我实际上认为
网址是在我从HTML文件中删除时预先转义的。我已经搜索了
,但无法找到逃脱的标准库函数或
unescapes URL。有没有这样的功能?
I didn''t say it urlretrieve was escaping the URL. I actually think the
URLs are pre-escaped when I scrape them from a HTML file. I have searched
for, but been unable to find, standard library functions that escapes or
unescapes URLs. Are there any such functions?
这篇关于网址和&符号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!