使用Google Apps脚本进行网络抓取 [英] Web scraping with Google Apps Script

查看:95
本文介绍了使用Google Apps脚本进行网络抓取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图使用Google Apps脚本从以下示例网页中提取数据:

url = http://www.premierleague.com/players/2064/Wayne-Rooney/stats?se=54



使用,UrlFetchApp.Fetch(URL)



问题是,当我使用UrlFetchApp.Fetch(URL)要做到这一点,我不会在url中获取'se'参数定义的页面信息。相反,我获取了以下URL的信息,因为它看起来像'se = 54'页面是异步加载的:
http://www.premierleague.com/players/2064/Wayne-Rooney/stats



有没有任何方式通过其他方式传递参数'se'?我在查看这个函数,它允许指定'options',因为它们被引用,但是关于这个主题的文档是非常有限的。



任何帮助都可以非常感谢。非常感谢

Tommy

解决方案

在您的浏览器中转到该网站并打开开发人员工具(F12或ctr-shift-i)。点击网络标签,然后用F5重新加载页面。
将出现请求列表。在列表底部,您应该看到获取信息的异步​​请求。这些请求从footballapi.pulselive.com获取json形式的数据。
您可以在应用程序脚本中执行相同的操作。但是您必须发送正确的原始标题行或您的请求被拒绝。
这是一个例子。

  function fetchData(){
var url =http:// footballapi.pulselive.com/football/stats/player/2064?comps=1\" ;
var options = {
headers:{
Origin:http://www.premierleague.com
}
}
var json = JSON.parse(UrlFetchApp.fetch(url,options).getContentText());
for(var i = 0; i< json.stats.length; i ++){
if(json.stats [i] .name ===goals)Logger.log(json。统计[I]);
}
}


I'm trying to pull data from the following sample web page using Google Apps Script:

url = http://www.premierleague.com/players/2064/Wayne-Rooney/stats?se=54

using, UrlFetchApp.Fetch(url)

The problem is when I use UrlFetchApp.Fetch(url) to do that, I don't get the page information defined by the 'se' parameter in the url. Instead, I get the information on the following URL because it looks like the 'se=54' page is asynchronously loaded: http://www.premierleague.com/players/2064/Wayne-Rooney/stats

Is there any way to pass the parameter 'se' some other way? I was looking at the function and it allows the specification of 'options', as they are referred to, but the documentation on the topic is very limited.

Any help would be most appreciated. Many thanks

Tommy

解决方案

Go to that website in your browser and open the developer tools (F12 or ctr-shift-i). Click on the network tab and reload the page with F5. A list of requests will appear. At the bottom of the list you should see the asynchronous requests made to fetch the information. Those requests get the data in json form from footballapi.pulselive.com. You can do the same thing in apps script. But you have to send a correct "origin" header line or your request gets rejected. Here is an example.

function fetchData() {
  var url = "http://footballapi.pulselive.com/football/stats/player/2064?comps=1";
  var options = {
    "headers": {
      "Origin": "http://www.premierleague.com"
    }
  }
  var json = JSON.parse(UrlFetchApp.fetch(url, options).getContentText()); 
  for(var i = 0; i < json.stats.length; i++) {
    if(json.stats[i].name === "goals") Logger.log(json.stats[i]);
  }
}

这篇关于使用Google Apps脚本进行网络抓取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆