你怎么刮AJAX页面? [英] How do you scrape AJAX pages?

查看:123
本文介绍了你怎么刮AJAX页面?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

标题说明了一切。请指教如何刮AJAX页面。

The title says it all. Please advise how to scrape AJAX pages.

推荐答案

概述:

所有屏幕刮首先需要你想提取的资源页面人工审查。当使用AJAX处理,你通常只需要分析多一点不仅仅是单纯的HTML。

All screen scraping first requires manual review of the page you want to extract resources from. When dealing with AJAX you usually just need to analyze a bit more than just simply the HTML.

在使用AJAX处理这只是意味着你想要的值是不是您所要求的初始HTML文件中,但JavaScript的将exectued它要求在服务器上为需要的额外信息。

When dealing with AJAX this just means that the value you want is not in the initial HTML document that you requested, but that javascript will be exectued which asks the server for the extra information you want.

您可以因此通常简单地分析JavaScript的,看看哪些要求JavaScript的品牌并调用这个URL,而不是从一开始。

You can therefore usually simply analyze the javascript and see which request the javascript makes and just call this URL instead from the start.


示例:

拿这个作为一个例子,假设你想从具有下面的脚本来刮的页面:

Take this as an example, assume the page you want to scrape from has the following script:

<script type="text/javascript">
function ajaxFunction()
{
var xmlHttp;
try
  {
  // Firefox, Opera 8.0+, Safari
  xmlHttp=new XMLHttpRequest();
  }
catch (e)
  {
  // Internet Explorer
  try
    {
    xmlHttp=new ActiveXObject("Msxml2.XMLHTTP");
    }
  catch (e)
    {
    try
      {
      xmlHttp=new ActiveXObject("Microsoft.XMLHTTP");
      }
    catch (e)
      {
      alert("Your browser does not support AJAX!");
      return false;
      }
    }
  }
  xmlHttp.onreadystatechange=function()
    {
    if(xmlHttp.readyState==4)
      {
      document.myForm.time.value=xmlHttp.responseText;
      }
    }
  xmlHttp.open("GET","time.asp",true);
  xmlHttp.send(null);
  }
</script>

然后,所有你需要做的是,而不是做一个HTTP请求到同一台服务器的time.asp代替。 例如,从W3Schools的


高级用C ++刮:

对于复杂的应用,如果你使用的是C ++,你也可以考虑使用Firefox的JavaScript引擎 SpiderMonkey的以在网页上执行JavaScript。

For complex usage, and if you're using C++ you could also consider using the firefox javascript engine SpiderMonkey to execute the javascript on a page.

高级刮痧与Java:

对于复杂的应用,如果你使用的是Java,你也可以考虑使用Firefox的JavaScript引擎对Java 犀牛

For complex usage, and if you're using Java you could also consider using the firefox javascript engine for Java Rhino

高级刮痧与.NET:

对于复杂的应用,如果你使用的.Net你也可以考虑使用Microsoft.vsa组装。最近更换I codeCompiler / codeDOM。

For complex usage, and if you're using .Net you could also consider using the Microsoft.vsa assembly. Recently replaced with ICodeCompiler/CodeDOM.

这篇关于你怎么刮AJAX页面?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆