从特定标题值的表中抓取数据并过滤特定行(Google App Script) [英] Scraping data from a table from a specific title value and filter specific lines (Google App Script)
问题描述
CherrioGS
的文档:
我尝试的代码:
function PaginaDoJogo() {var sheet = SpreadsheetApp.getActive().getSheetByName('Dados Importados');var url = 'https://www.sportsgambler.com/injuries/football/argentina-superliga/';const contentText = UrlFetchApp.fetch(url).getContentText();const $ = Cheerio.load(contentText);$('div:contains("Argentinos Jrs") > div > div.inj-container:not(contains("Away on International duty")) > span.inj-player').each((index, element) => {sheet.getRange(index + 2, 1).setValue($(element).text());});$('div:contains("Argentinos Jrs") > div > div.inj-container:not(contains("Away on International duty")) > span.inj-return.h-sm').each((index, element) => {sheet.getRange(index + 2, 2).setValue($(element).text());});}
function PaginaDoJogo() {const sheet = SpreadsheetApp.getActive().getSheetByName('Dados Importados');const url = 'https://www.sportsgambler.com/injuries/football/argentina-superliga/';const response = UrlFetchApp.fetch(url);const content = response.getContentText();const match = content.match(/Argentinos Jrs[\s\S]+?/);const regExp =/(.+?)<\/span>[\s\S]+?(.+?)<\/span>[\s\S]+?(.+?)<\/span>[\s\S]+?<\/div>/g;常量值 = [];while ((r = regExp.exec(match[0])) !== null) {//console.log(r[1], r[2], r[3]);if (r[1] !== '姓名' && r[2] !== '出差') {values.push([r[1], r[3]]);}}sheet.getRange(2, 1, values.length, 2).setValues(values);}Documentation for CherrioGS
:
https://github.com/tani/cheeriogs
The idea is to collect only data from the table with the name Argentinos Jrs
and that lines with the value Away on International duty
in the info
column are not saved.
Note: I really need to specify according to the value Argentinos Jrs
and remove Away on International duty
, because the position of this table is not fixed and the values in lines too.
The expected result in this example I'm looking for is this:
Carlos Quintana Mid August
Jonathan Sandoval Early August
The website link is this:
https://www.sportsgambler.com/injuries/football/argentina-superliga/
I will leave the current image of the site because if the data changes, the idea of my example is registered:
The code I try:
function PaginaDoJogo() {
var sheet = SpreadsheetApp.getActive().getSheetByName('Dados Importados');
var url = 'https://www.sportsgambler.com/injuries/football/argentina-superliga/';
const contentText = UrlFetchApp.fetch(url).getContentText();
const $ = Cheerio.load(contentText);
$('div:contains("Argentinos Jrs") > div > div.inj-container:not(contains("Away on International duty")) > span.inj-player')
.each((index, element) => {
sheet.getRange(index + 2, 1).setValue($(element).text());
});
$('div:contains("Argentinos Jrs") > div > div.inj-container:not(contains("Away on International duty")) > span.inj-return.h-sm')
.each((index, element) => {
sheet.getRange(index + 2, 2).setValue($(element).text());
});
}
解决方案 function PaginaDoJogo() {
const sheet = SpreadsheetApp.getActive().getSheetByName('Dados Importados');
const url = 'https://www.sportsgambler.com/injuries/football/argentina-superliga/';
const response = UrlFetchApp.fetch(url);
const content = response.getContentText();
const match = content.match(/Argentinos Jrs[\s\S]+?<!--Livestream call to action-->/);
const regExp = /<div[\s\S]+?<span class="inj-player">(.+?)<\/span>[\s\S]+?<span class="inj-info">(.+?)<\/span>[\s\S]+?<span class="inj-return h-sm">(.+?)<\/span>[\s\S]+?<\/div>/g;
const values = [];
while ((r = regExp.exec(match[0])) !== null) {
// console.log(r[1], r[2], r[3]);
if (r[1] !== 'Name' && r[2] !== 'Away on International duty') {
values.push([r[1], r[3]]);
}
}
sheet.getRange(2, 1, values.length, 2).setValues(values);
}
这篇关于从特定标题值的表中抓取数据并过滤特定行(Google App Script)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文