Indy - IdHttp 如何处理页面重定向? [英] Indy - IdHttp how to handle page redirects?

查看:44
本文介绍了Indy - IdHttp 如何处理页面重定向?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用:Delphi 2010,Indy 最新版本

我正在尝试从 Google Adsense 网页上抓取数据,目的是获取报告.但是到目前为止我一直没有成功.它在第一个请求后停止,不再继续.

使用 Fiddler 调试到 Google Adsense 网站的流量/请求,并使用网络浏览器加载 Adsense 页面,我可以看到请求(来自网络浏览器)生成了许多重定向,直到页面加载完毕.

但是,我的 Delphi 应用程序在停止之前只生成了几个请求.

以下是我遵循的步骤:

  1. 在表单上放置一个 IdHTTP 和一个 IdSSLIOHandlerSocketOpenSSL1 组件.
  2. 将 IdHTTP 组件属性 AllowCookies 和 HandleRedirects 设置为 True,将 IOHandler 属性设置为 IdSSLIOHandlerSocketOpenSSL1.
  3. 设置 IdSSLIOHandlerSocketOpenSSL1 组件属性 Method := 'sslvSSLv23'

我终于有了这个代码:

procedure TfmMain.GetUrlToFile(AURL, AFile : String);无功输出:TMemoryStream;开始输出:= TMemoryStream.Create;尝试IdHTTP1.Get(FURL, 输出);Output.SaveToFile(AFile);最后输出.免费;结尾;结尾;

但是,它没有按预期进入登录页面.我希望它的行为就像是一个网络浏览器,并继续进行重定向,直到找到最终页面.

这是 Fiddler 头文件的输出:

<块引用>

HTTP/1.1 302 发现位置:https://encrypted.google.com/缓存控制:私有内容类型:文本/html;字符集=UTF-8设置-Cookie:PREF=ID=5166063f01b64b03:FF=0:TM=1293571783:LM=1293571783:S=a5OtsOqxu_GiV3d6;到期=周四,2012 年 12 月 27 日 21:29:43 GMT;路径=/;域=.google.com设置-Cookie:NID=42=XFUwZdkyF0TJKmoJjqoGgYNtGyOz-Irvz7ivao2z0--pCBKPpAvCGUeaa5GXLneP41wlpse-yU5UuC57pBfMkv434t7XB1H68ETPNAZVnjQRnVD到期=星期三,2011 年 6 月 29 日 21:29:43 GMT;路径=/;域=.google.com;仅Http日期:2010 年 12 月 28 日,星期二 21:29:43 GMT服务器:gws内容长度:226X-XSS-保护:1;模式=块

首先,这个输出有什么问题吗?

我还需要做些什么来让 IdHTTP 组件继续进行重定向直到最后一页?

解决方案

调用前的IdHTTP组件属性值:

 名称 := 'IdHTTP1';IOHandler := IdSSLIOHandlerSocketOpenSSL1;AllowCookies := True;HandleRedirects := True;重定向最大值:= 35;请求.用户代理:='Mozilla/5.0 (Windows NT 5.1; rv:2.0b8) Gecko/20100101 Firefox/4.'+'0b8';HTTPOptions := [hoForceEncodeParams];OnRedirect := IdHTTP1Redirect;CookieManager := IdCookieManager1;

重定向事件处理程序:

procedure TfmMain.IdHTTP1Redirect(Sender: TObject; var dest: string; varNumRedirect:整数;处理的变量:布尔值;var VMethod: 字符串);开始处理 := True;结尾;

拨打电话:

 FURL := 'https://www.google.com';GetUrlToFile((FURL + '/adsense/'), 'a.html');过程 TfmMain.GetUrlToFile(AURL, AFile : String);无功输出:TMemoryStream;开始输出:= TMemoryStream.Create;尝试尝试IdHTTP1.Get(AURL, 输出);IdHTTP1.断开连接;除了结尾;Output.SaveToFile(AFile);最后输出.免费;结尾;结尾;





这是 Fiddler 的(请求和响应标头)输出:

Using: Delphi 2010, latest version of Indy

I am trying to scrape the data off Googles Adsense web page, with an aim to get the reports. However I have been unsuccessful so far. It stops after the first request and does not proceed.

Using Fiddler to debug the traffic/requests to Google Adsense website, and a web browser to load the Adsense page, I can see that the request (from the webbrowser) generates a number of redirects until the page is loaded.

However, my Delphi application is only generating a couple of requests before it stops.

Here are the steps I have followed:

  1. Drop a IdHTTP and a IdSSLIOHandlerSocketOpenSSL1 component on the form.
  2. Set the IdHTTP component properties AllowCookies and HandleRedirects to True, and IOHandler property to the IdSSLIOHandlerSocketOpenSSL1.
  3. Set the IdSSLIOHandlerSocketOpenSSL1 component property Method := 'sslvSSLv23'

Finally I have this code:

procedure TfmMain.GetUrlToFile(AURL, AFile : String);
var
 Output : TMemoryStream;
begin
  Output := TMemoryStream.Create;
  try
    IdHTTP1.Get(FURL, Output);
    Output.SaveToFile(AFile);
  finally
    Output.Free;
  end;
end;

However, it does not get to the login page as expected. I would expect it to behave as if it was a webbrowser and proceed through the redirects until it finds the final page.

This is the output of the headers from Fiddler:

HTTP/1.1 302 Found
Location: https://encrypted.google.com/
Cache-Control: private
Content-Type: text/html; charset=UTF-8
Set-Cookie: PREF=ID=5166063f01b64b03:FF=0:TM=1293571783:LM=1293571783:S=a5OtsOqxu_GiV3d6; expires=Thu, 27-Dec-2012 21:29:43 GMT; path=/; domain=.google.com
Set-Cookie: NID=42=XFUwZdkyF0TJKmoJjqoGgYNtGyOz-Irvz7ivao2z0--pCBKPpAvCGUeaa5GXLneP41wlpse-yU5UuC57pBfMkv434t7XB1H68ET0ZgVDNEPNmIVEQRVj7AA1Lnvv2Aez; expires=Wed, 29-Jun-2011 21:29:43 GMT; path=/; domain=.google.com; HttpOnly
Date: Tue, 28 Dec 2010 21:29:43 GMT
Server: gws
Content-Length: 226
X-XSS-Protection: 1; mode=block

Firstly, is there anything wrong with this output?

Is there something more that I should do to get the IdHTTP component to keep pursuing the redirects until the final page?

解决方案

IdHTTP component property values prior to making the call:

    Name := 'IdHTTP1';
    IOHandler := IdSSLIOHandlerSocketOpenSSL1;
    AllowCookies := True;
    HandleRedirects := True;
    RedirectMaximum := 35;
    Request.UserAgent := 
      'Mozilla/5.0 (Windows NT 5.1; rv:2.0b8) Gecko/20100101 Firefox/4.' +
      '0b8';
    HTTPOptions := [hoForceEncodeParams];
    OnRedirect := IdHTTP1Redirect;
    CookieManager := IdCookieManager1;

Redirect event handler:

procedure TfmMain.IdHTTP1Redirect(Sender: TObject; var dest: string; var
    NumRedirect: Integer; var Handled: Boolean; var VMethod: string);
begin
   Handled := True;
end;

Making the call:

  FURL := 'https://www.google.com';

  GetUrlToFile( (FURL + '/adsense/'), 'a.html');




  procedure TfmMain.GetUrlToFile(AURL, AFile : String);
  var
   Output : TMemoryStream;
  begin
    Output := TMemoryStream.Create;
    try
      try
       IdHTTP1.Get(AURL, Output);
       IdHTTP1.Disconnect;
      except

      end;
      Output.SaveToFile(AFile);
    finally
      Output.Free;
    end;
  end;





Here's the (request and response headers) output from Fiddler:

这篇关于Indy - IdHttp 如何处理页面重定向?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆