Python机械化阻止连接:关闭 [英] Python Mechanize Prevent Connection:Close

查看:102
本文介绍了Python机械化阻止连接:关闭的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用机械化从网页中获取信息.它基本上可以成功获取第一批信息,但是网页上包含一个用于获取更多信息的下一步"按钮.我不知道如何以编程方式获取其他信息.

I'm trying to use mechanize to get information from a web page. It's basically succeeding in getting the first bit of information, but the web page includes a button for "Next" to get more information. I can't figure out how to programmatically get the additional information.

通过使用实时HTTP标头,我可以看到单击浏览器中的下一个按钮时生成的http请求.好像我可以使用机械化发出相同的请求,但是在后一种情况下,我没有进入下一页,而是被重定向到网站的首页.

By using Live HTTP Headers, I can see the http request that is generated when I click the next button within a browser. It seems as if I can issue the same request using mechanize, but in the latter case, instead of getting the next page, I am redirected to the home page of the website.

很明显,机械化所做的事情与我的浏览器有所不同,但是我不知道该怎么做.在比较标题时,我确实发现了一个区别,那就是所使用的浏览器

Obviously, mechanize is doing something different than my browser is, but I can't figure out what. In comparing the headers, I did find one difference, which was the browser used

连接:保持活动状态

机械化后使用

连接:关闭

我不知道这是否是罪魁祸首,但是当我尝试添加标头("Connection","keep-alive")时,它什么都没有改变.

I don't know if that's the culprit, but when I tried to add the header ('Connection','keep-alive'), it didn't change anything.

[更新] 当我在Firefox中单击第2页"按钮时,生成的http是(根据实时HTTP标头):

[UPDATE] When I click the button for "page 2" within Firefox, the generated http is (according to Live HTTP Headers):

GET /statistics/movies/ww_load/the-fast-and-the-furious-6-2012?authenticity_token=ItU38334Qxh%2FRUW%2BhKoWk2qsPLwYKDfiNRoSuifo4ns%3D&facebook_fans_page=2&tbl=facebook_fans&authenticity_token=ItU38334Qxh%2FRUW%2BhKoWk2qsPLwYKDfiNRoSuifo4ns%3D HTTP/1.1
Host: www.boxoffice.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:18.0) Gecko/20100101 Firefox/18.0
Accept: text/javascript, text/html, application/xml, text/xml, */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
X-Requested-With: XMLHttpRequest
X-Prototype-Version: 1.6.0.3
Referer: http://www.boxoffice.com/statistics/movies/the-fast-and-the-furious-6-2012
Cookie: __utma=179025207.1680379428.1359475480.1360001752.1360005948.13; __utmz=179025207.1359475480.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __qca=P0-668235205-1359475480409; zip=13421; country_code=US; _boxoffice_session=2202c6a47fc5eb92cd0ba57ef6fbd2c8; __utmc=179025207; user_credentials=d3adbc6ecf16c038fcbff11779ad16f528db8ebd470befeba69c38b8a107c38e9003c7977e32c28bfe3955909ddbf4034b9cc396dac4615a719eb47f49cc9eac%3A%3A15212; __utmb=179025207.2.10.1360005948
Connection: keep-alive

当我尝试在机械化中请求相同的URL时,它看起来像这样:

When I try to request the same url within mechanize, it looks like this:

GET /statistics/movies/ww_load/the-fast-and-the-furious-6-2012?facebook_fans_page=2&tbl=facebook_fans&authenticity_token=ZYcZzBHD3JPlupj%2F%2FYf4dQ42Kx9ZBW1gDCBuJ0xX8X4%3D HTTP/1.1
Accept-Encoding: identity
Host: www.boxoffice.com
Accept: text/javascript, text/html, application/xml, text/xml, */*
Keep-Alive: 115
Connection: close
Cookie: _boxoffice_session=ced53a0ca10caa9757fd56cd89f9983e; country_code=US; zip=13421; user_credentials=d3adbc6ecf16c038fcbff11779ad16f528db8ebd470befeba69c38b8a107c38e9003c7977e32c28bfe3955909ddbf4034b9cc396dac4615a719eb47f49cc9eac%3A%3A15212
Referer: http://www.boxoffice.com/statistics/movies/the-fast-and-the-furious-6-2012
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1

- 达里尔

推荐答案

服务器正在检查X-Requested-With和/或X-Prototype-Version,因此将这两个标头添加到机械化请求中即可解决该问题.

The server was checking X-Requested-With and/or X-Prototype-Version, so adding those two headers to the mechanize request fixed it.

这篇关于Python机械化阻止连接:关闭的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆