从特定标题值的表中抓取数据并过滤特定行(Google App Script) [英] Scraping data from a table from a specific title value and filter specific lines (Google App Script)

查看:28
本文介绍了从特定标题值的表中抓取数据并过滤特定行(Google App Script)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

CherrioGS 的文档:

我尝试的代码:

function PaginaDoJogo() {var sheet = SpreadsheetApp.getActive().getSheetByName('Dados Importados');var url = 'https://www.sportsgambler.com/injuries/football/argentina-superliga/';const contentText = UrlFetchApp.fetch(url).getContentText();const $ = Cheerio.load(contentText);$('div:contains("Argentinos Jrs") > div > div.inj-container:not(contains("Away on International duty")) > span.inj-player').each((index, element) => {sheet.getRange(index + 2, 1).setValue($(element).text());});$('div:contains("Argentinos Jrs") > div > div.inj-container:not(contains("Away on International duty")) > span.inj-return.h-sm').each((index, element) => {sheet.getRange(index + 2, 2).setValue($(element).text());});}

解决方案

function PaginaDoJogo() {const sheet = SpreadsheetApp.getActive().getSheetByName('Dados Importados');const url = 'https://www.sportsgambler.com/injuries/football/argentina-superliga/';const response = UrlFetchApp.fetch(url);const content = response.getContentText();const match = content.match(/Argentinos Jrs[\s\S]+?/);const regExp =/

(.+?)<\/span>[\s\S]+?(.+?)<\/span>[\s\S]+?(.+?)<\/span>[\s\S]+?<\/div>/g;常量值 = [];while ((r = regExp.exec(match[0])) !== null) {//console.log(r[1], r[2], r[3]);if (r[1] !== '姓名' && r[2] !== '出差') {values.push([r[1], r[3]]);}}sheet.getRange(2, 1, values.length, 2).setValues(values);}

Documentation for CherrioGS:
https://github.com/tani/cheeriogs

The idea is to collect only data from the table with the name Argentinos Jrs and that lines with the value Away on International duty in the info column are not saved.

Note: I really need to specify according to the value Argentinos Jrs and remove Away on International duty, because the position of this table is not fixed and the values in lines too.

The expected result in this example I'm looking for is this:

Carlos Quintana      Mid August
Jonathan Sandoval    Early August

The website link is this:
https://www.sportsgambler.com/injuries/football/argentina-superliga/

I will leave the current image of the site because if the data changes, the idea of my example is registered:

The code I try:

function PaginaDoJogo() {
    var sheet = SpreadsheetApp.getActive().getSheetByName('Dados Importados');
    var url = 'https://www.sportsgambler.com/injuries/football/argentina-superliga/';

    const contentText = UrlFetchApp.fetch(url).getContentText();
    const $ = Cheerio.load(contentText);

    $('div:contains("Argentinos Jrs") > div > div.inj-container:not(contains("Away on International duty")) > span.inj-player')
        .each((index, element) => {
            sheet.getRange(index + 2, 1).setValue($(element).text());
        });

    $('div:contains("Argentinos Jrs") > div > div.inj-container:not(contains("Away on International duty")) > span.inj-return.h-sm')
        .each((index, element) => {
            sheet.getRange(index + 2, 2).setValue($(element).text());
        });
}

解决方案

function PaginaDoJogo() {
  const sheet = SpreadsheetApp.getActive().getSheetByName('Dados Importados');
  const url = 'https://www.sportsgambler.com/injuries/football/argentina-superliga/';
  const response = UrlFetchApp.fetch(url);
  const content = response.getContentText();
  const match = content.match(/Argentinos Jrs[\s\S]+?<!--Livestream call to action-->/);
  const regExp = /<div[\s\S]+?<span class="inj-player">(.+?)<\/span>[\s\S]+?<span class="inj-info">(.+?)<\/span>[\s\S]+?<span class="inj-return h-sm">(.+?)<\/span>[\s\S]+?<\/div>/g;
  const values = [];
  while ((r = regExp.exec(match[0])) !== null) {
    // console.log(r[1], r[2], r[3]);
    if (r[1] !== 'Name' && r[2] !== 'Away on International duty') {
      values.push([r[1], r[3]]);
    }
  }
  sheet.getRange(2, 1, values.length, 2).setValues(values);
}

这篇关于从特定标题值的表中抓取数据并过滤特定行(Google App Script)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆