Screen Scrape POST表单结果页面 [英] Screen Scrape POST Form Results Pages

查看:65
本文介绍了Screen Scrape POST表单结果页面的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好。我想从一个从表单输入生成页面的站点中抓取页面。但是,结果页面的URL(我要抓取的页面)被屏蔽并且始终相同。对于每个表单输入组合,它看起来都是这样的:

http://www.scrapedsite.com/these/are/results.htm


甚至可以刮掉各个结果页面吗?谢谢。

解决方案

嗨。


我不完全确定你的意思是什么。
(就我的英语知识而言,这个词并不适合那种语境......但话又说回来,我知道什么:P)


Heya,Mike。


您是否正在尝试创建网页的屏幕截图?


网址:

http://www.scrapedsite.com/these/are/results.htm


最有可能是重写服务器端,否则results.htm依赖于_SESSION或_POST用于生成内容的变量。

您可能在创建HTTP流上下文或使用cURL模拟POST请求方面取得了一些成功,但如果站点使用会话变量,则可能不在运气。



Heya,Mike。


您是否正在尝试制作截图网页?


网址:
http://www.scrapedsite.com/these/are/results.htm


最有可能是重写服务器端,否则results.htm依赖于_SESSION或_POST变量来生成内容。


您可能在创建HTTP流上下文或使用cURL模拟POST请求方面取得了一些成功,但如果该站点使用会话变量,您可能会失败。



感谢大家的回复。 Atli再次尝试提供帮助。我的意思是scrape是我想抓取网页上的内容(HTML),这样我就可以解析它并以某种方式使用数据。


我想抓住使用此表单后,结果页面中的数据。


它实际上是一系列表格(你点击下一步进入下一个下拉列表),所以我在想他们正在使用会话变量。根据你的说法,听起来似乎无法获取确切的URL。 :/


Hello. I want to scrape pages from a site that generates pages from form inputs. However, the URL of the results page (the page I want to scrape) is masked and is always the same. It looks something like this for every form input combination:

http://www.scrapedsite.com/these/are/results.htm

Is it even possible to scrape the individual results pages? Thanks.

解决方案

Hi.

I am not entirely sure what you mean by scrape.
(as far as my English knowledge is concerned, that word doesn''t really fit in that context... but then again, what do I know :P)


Heya, Mike.

Are you trying to create a screenshot of a web page?

The URL:
http://www.scrapedsite.com/these/are/results.htm

Is most likely either being rewritten server side, or else results.htm relies on _SESSION or _POST variables to produce the content.

You may have some success creating an HTTP stream context or using cURL to simulate a POST request, but if the site uses session variables, you may be out of luck.


Heya, Mike.

Are you trying to create a screenshot of a web page?

The URL:
http://www.scrapedsite.com/these/are/results.htm

Is most likely either being rewritten server side, or else results.htm relies on _SESSION or _POST variables to produce the content.

You may have some success creating an HTTP stream context or using cURL to simulate a POST request, but if the site uses session variables, you may be out of luck.

Thanks for the responses everyone. Once again, Atli is trying to be helpful. What I mean by scrape is that I want to grab the content on a webpage (the HTML), so that I can parse it and use the data somehow.

I want to grab the data from the results pages after using this form .

It''s actually a series of forms (you click next to get to the next drop down list), so I''m thinking that they''re using session variables. From what you said, it sounds like it''s not possible to grab the exact URL. :/


这篇关于Screen Scrape POST表单结果页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆