无法抓取Google表格中的表格 [英] Not Able to Scrape table in Google Sheets

查看:121
本文介绍了无法抓取Google表格中的表格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个SO问题的帮助下, 以下网站.我想要两个团队和时间.例如,第一个条目将是Chicago |.迈阿密|下午12:30,最后一个条目将是Colorado |亚利桑那州|晚上10:10.我的代码如下

With the help of this SO questionsI am trying to scrape the following website. I would like the two teams and the time. For example, the first entry would be Chicago | Miami | 12:30 PM, and the last entry would be Colorado | Arizona | 10:10 PM. My code is as follows

function espn_schedule() {
  var url = "http://www.espn.com/mlb/schedule/_/date/20180329";
  var content = UrlFetchApp.fetch(url).getContentText();
  var scraped = Parser.data(content).from('class="schedule has-team-logos align-left"').to('</tbody>').iterate();
  var res = [];

  var temp = [];
  var away_ticker = "";
  scraped.forEach(function(e){
    var away_team = Parser.data(e).from('href="mlb/team/_/name/').to('"').build();
    var time = Parser.data(e).from('a data-dateformat="time1"').to('</a>').build();
    if (away_ticker == "") away_ticker = away_team;
    if (away_team != away_ticker) {
      temp.splice(1, 0, away_ticker);
      res.push(temp);
      temp = [];
      away_ticker = away_team;
      temp.push(time);
    }
  });
  var ss = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("Schedule");
  ss.getRange(ss.getLastRow() + 1, 1, res.length, res[0].length).setValues(res);
}

我收到以下错误:TypeError:无法从未定义中读取属性"length". (第42行,文件代码")

I get the following error: TypeError: Cannot read property "length" from undefined. (line 42, file "Code")

推荐答案

以下是可行的修改后的解决方案

Here is a modified solution that works

function espn_schedule() {
  var url = "http://www.espn.com/mlb/schedule/_/date/20180329";
  var content = UrlFetchApp.fetch(url).getContentText();
  var e = Parser.data(content).from('class="schedule has-team-logos align-left"').to('</tbody>').build();
  var res = [];
  //Logger.log(scraped[0])
  var temp = [];
  var away_ticker = "";
    var teams = Parser.data(e).from('<abbr title="').to('">').iterate();
    Logger.log(teams)
    var time = Parser.data(e).from('data-date="').to('">').iterate()
    Logger.log(time)

     for( var i = 0; i<teams.length ; i = i+2)
     {
       res[i/2] = []
       res[i/2][0] = teams[i]
       res[i/2][1] = teams[i+1]
       res[i/2][2] = new Date(time[i/2]).toLocaleTimeString('en-US')
     }
  Logger.log(res)
  var ss = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("Schedule");
  ss.getRange(ss.getLastRow() + 1, 1, res.length, res[0].length).setValues(res);
}

修改说明:
1)由于您仅访问第一个表,因此在解析过程中无需进行迭代,只需获取第一个表即可.另外,由于仅获得第一个表,因此不需要使用forEach遍历每个元素.

Modification explained:
1) Since you access only the first table you don't need to iterate during parsing and just get the first table. Also, since you get just the first table, you don't need to use forEach to loop through each element.

var e = Parser.data(content)
        .from('class="schedule has-team-logos align-left"')
        .to('</tbody>')
        .build();   //Use build instead of iterate

2)您可以使用<abbr title="元素来刮除名称,而不是解析HTML链接来获取团队名称.此外,您可以遍历表中的所有团队名称以获得一组团队名称.

2) Instead of parsing the HTML link to get the team name, you can use <abbr title=" element to scrape the name. Furthermore, you can iterate over all the team names in the table to get an array of team names.

var teams = Parser.data(e).from('<abbr title="').to('">').iterate();

3)与上述修改类似,您可以通过使用data-date标记来获取时间.这为您提供了可以由Date()类读取的日期.再次,我们遍历表格以获取所有时间

3) Similar to the above modification, you can get the time by using the data-date tag. This gives you date which can read by Date() class. Again, we iterate over the table to get all the times

var time = Parser.data(e).from('data-date="').to('">').iterate()

4)最后,我们使用for循环在称为res的数组中重新排列团队和时间.这样可以将数据直接插入到工作表中.

4) Finally, we use for loop to rearrange the teams and time in the array called res. This allows for inserting the data into the sheet directly.

for( var i = 0; i<teams.length ; i = i+2) //each loop adds 2 to the counter
         {
           res[i/2] = []         
           res[i/2][0] = teams[i]   //even team  (starts at zero)
           res[i/2][1] = teams[i+1] //vs odd teams
           res[i/2][2] = new Date(time[i/2]).toLocaleTimeString('en-US')
         }

参考:
Date()Date.toLocaleTimeString()

Reference:
Date(),Date.toLocaleTimeString()


错误原因,在下面的代码中


Reason for error, in the below code

Parser.data(e).from('href="mlb/team/_/name/').to('"').build()

您正在寻找字符串'href="mlb/team/_/name/',但是它应该是href="/mlb/team/_/name/'.请注意mlb/mlb的区别.

you are looking for string 'href="mlb/team/_/name/', however it should be href="/mlb/team/_/name/'. Note the difference mlb vs /mlb.

第二,在下面的代码中

Parser.data(e).from('a data-dateformat="time1"').to('</a>').build();

当您检查显示为dateformat的网站时,字符串应为a data-dateFormat.但是,当您使用URLfetch调用它并记录文本时,它显示为dateFormat

The string should be a data-dateFormat, when you inspect the website it shown as dateformat. However, when you call it using URLfetch and log the text, it is shown as dateFormat

这篇关于无法抓取Google表格中的表格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆