为什么urllib2.urlopen无法打开"http://localhost/new-post#comment-29"之类的页面? [英] Why urllib2.urlopen can not open pages like "http://localhost/new-post#comment-29"?

查看:113
本文介绍了为什么urllib2.urlopen无法打开"http://localhost/new-post#comment-29"之类的页面?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很好奇,为什么在此行中出现404错误:

I'm curious, how come I get 404 error running this line:

urllib2.urlopen("http://localhost/new-post#comment-29")

虽然一切正常,但 http://localhost/new-post#comment-29 在任何浏览器中...

While everything works fine surfing http://localhost/new-post#comment-29 in any browser...

urlopen方法不解析其中带有#"的网址吗?

urlopen method does not parse urls with "#" in it?

有人知道吗?

推荐答案

在HTTP协议中,片段(从#开始)不会通过网络发送到服务器:它是由浏览器本地保留并使用的,一旦完全收到服务器的响应,就可以以某种方式可视化"定位页面中要显示为当前"的确切位置(例如,如果返回的页面是HTML,则可以通过解析HTML并查找第一个合适的<a>标志).

In the HTTP protocol, the fragment (from # onwards) is not sent to the server across the network: it's locally retained by the browser and used, once the server's response is fully received, to somehow "visually locate" the exact spot in the page to be shown as "current" (for example, if the returned page is in HTML, this will be done by parsing the HTML and looking for the first suitable <a> flag).

因此,过程是:删除片段,例如通过urlparse.urlparse;使用其余的来获取资源;根据服务器响应的content-type标头对其进行适当的解析;然后根据您在第一步中保留的片段的位置,在解析的资源中定位您的程序对资源当前点"采取的任何视觉操作.

So, the procedure is: remove the fragment e.g. via urlparse.urlparse; use the rest to fetch the resource; parse it appropriately based on the server response's content-type header; then take whatever visual action your program does regarding the "current spot" on the resource, based on locating within the parsed resource the fragment you retained in the first step.

这篇关于为什么urllib2.urlopen无法打开"http://localhost/new-post#comment-29"之类的页面?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆