无法抓取数据 [英] Not able to scrape data

查看:96
本文介绍了无法抓取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚开始使用Google Apps脚本.由于最佳编码实践建议使用尽可能少的工作表公式,因此我尝试使用GAS Parser进行网页抓取,然后将数据推送到电子表格中.

I am just starting out in Google Apps Script. Since best coding practices recommend using as few sheet formulas as possible I am trying to do my web scraping with GAS Parser then push the data over to my spreadsheet.

在我的工作表中,使用以下公式返回的数据表正是我在GAS中寻找的数据.

Within my sheet using the below formula returns a table of data which is exactly what I am looking for from GAS.

=IMPORTHTML("https://finance.yahoo.com/quote/BOO.L/history?p=BOO.L", "table", 1)

此处 此处相似,但是尝试这些方法也会失败.似乎几乎没有得到完整的页面内容,因为当我在下面的代码后在Logger.log()中查看数据时,没有得到与我需要的页面相似的任何内容.

The two questions here & here are similar but trying those methods also fail. It almost seems like I am not getting the full page content since when I view data in Logger.log() after the code below I am not getting anything that resembles the page I need.

UrlFetchApp.fetch(url).getContentText();

因为运行公式似乎可以完美地获取数据,所以我只能假设自己的代码有问题,而无法确定问题出在哪里.这是到目前为止我尝试过的代码;

Since running the formula seems to get the data perfectly I can only assume the problems with my own code but can't figure where. Here is the code I have tried thus far;

function scrapeData() {
var url = "https://finance.yahoo.com/quote/BARC.L/history?p=BARC.L";
var fromText = '<td class="Py(10px) Ta(start) Pend(10px)"><span>';
var toText = '</span></td>';
var content = UrlFetchApp.fetch(url).getContentText();
var scraped = Parser
          .data(content)
          .from(fromText)
          .to(toText)
          .iterate();

Logger.log(scraped)
}

任何指导都值得赞赏.

推荐答案

  • 您要使用Google Apps脚本从URL中检索值并将其放在电子表格中.
  • 如果我的理解是正确的,那么该修改如何?我认为您的情况有几个答案.因此,请将此视为其中之一.

    If my understanding is correct, how about this modification? I think that there are several answers for your situation. So please think of this as one of them.

    • 为了检索表,我使用了ParserXmlService.
    1. 使用Parser将表检索为字符串值.
    2. 使用XmlService解析表.我认为XmlService使我们可以轻松地分析表.
    1. Retrieve the table as the string value using Parser.
    2. Parse the table using XmlService. I think that XmlService makes us easily parse the table.

    XmlService是XML的强大分析工具.因此,当可以将其用于HTML时,它使我们可以更轻松地从HTML检索值.但是,最近,XmlService无法直接解析最多的HTML.因此,我总是使用此流程.

    XmlService is the strong parsing tool of XML. So when this can be used to HTML, it makes us retrieve the values from HTML more easily. However, recently, the most HTML cannot be directly parsed by XmlService. So I always use this flow.

    function scrapeData() {
      // Retrieve table as a string using Parser.
      var url = "https://finance.yahoo.com/quote/BOO.L/history?p=BOO.L";
      // var url = "https://finance.yahoo.com/quote/BARC.L/history?p=BARC.L";
      var fromText = '<div class="Pb(10px) Ovx(a) W(100%)" data-reactid="30">';
      var toText = '<div class="Mstart(30px) Pt(10px)"';
      var content = UrlFetchApp.fetch(url).getContentText();
      var scraped = Parser.data(content).from(fromText).to(toText).build();
    
      // Parse table using XmlService.
      var root = XmlService.parse(scraped).getRootElement();
      // Retrieve header
      var headerTr = root.getChild("thead").getChildren();
      var res = headerTr.map(function(e) {return e.getChildren().map(function(f) {return f.getValue()})});
      var len = res[0].length;
      // Retrieve values
      var valuesTr = root.getChild("tbody").getChildren();
      var values = valuesTr.map(function(e) {return e.getChildren().map(function(f) {return f.getValue()})})
      .map(function(e) {return e.length == len ? e : e.concat(Array.apply(null, new Array(len - e.length)).map(String.prototype.valueOf,""))});
      Array.prototype.push.apply(res, values);
    
      // Put the result to the active spreadsheet.
      var ss = SpreadsheetApp.getActiveSheet();
      ss.getRange(1, 1, res.length, res[0].length).setValues(res);
    }
    

    注意:

    • 在运行此修改后的脚本之前,请安装解析器.
    • 在我的环境中,我可以确认修改后的脚本同时适用于p=BOO.Lp=BARC.L.我无法确认其他人.因此,当您尝试其他脚本时,如果发生错误,请修改脚本.
    • Note:

      • Before you run this modified script, please install the GAS library of Parser.
      • In my environment, I could confirmed that the modified script works for both p=BOO.L and p=BARC.L. I couldn't confirm others. So when you tried others, if an error occurs, please modify the script.
        • Parser
        • XmlService

        如果这不是您想要的,对不起.

        If this was not what you want, I'm sorry.

        这篇关于无法抓取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆