Yahoo Finance的Google表格抓取选项链,结果不完整 [英] Google Sheets Scraping Options Chain from Yahoo Finance, Incomplete Results

查看:56
本文介绍了Yahoo Finance的Google表格抓取选项链,结果不完整的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从Google表格中的Yahoo Finance中抓取期权定价数据.尽管我能够很好地拉动期权链,即

I'm attempting to scrape options pricing data from Yahoo Finance in Google Sheets. Although I'm able to pull the options chain just fine, i.e.

=IMPORTHTML("https://finance.yahoo.com/quote/TCOM/options?date=1610668800","table",2)

我发现它返回的结果与Yahoo Finance上实际显示的结果不完全匹配.具体来说,抓取的结果不完整-他们缺少一些警示.也就是说,图表的前5行可能会匹配,但是随后它将开始仅返回其他每条警告(也就是跳过其他每条警告).

I find that it's returning results that don't completely match what's actually shown on Yahoo Finance. Specifically, the scraped results are incomplete - they're missing some strikes. i.e. the first 5 rows of the chart may match, but then it will start returning only every other strike (aka skipping every other strike).

为什么IMPORTHTML返回缩写"字样?结果,与页面上实际显示的结果不匹配?而且更重要的是,是否有某种方法可以刮除 complete 数据(即不会跳过可用罢工的某些部分)?

Why would IMPORTHTML be returning "abbreviated" results, which don't match what's actually shown on the page? And more importantly, is there some way to scrape complete data (i.e. that doesn't skip some portion of the available strikes)?

推荐答案

我相信您的目标如下.

  • 您要从 https://finance.yahoo.com/quote/TCOM/options?date=1610668800 的URL中检索完整表,并将其放入电子表格中.
  • You want to retrieve the complete table from the URL of https://finance.yahoo.com/quote/TCOM/options?date=1610668800, and want to put it to the Spreadsheet.

我可以复制您的问题.不幸的是,当我看到HTML数据时,我找不到显示行和未显示行之间HTML的区别.而且,我可以确认完整表已包含在HTML数据中.顺便说一下,当我使用 = IMPORTXML(A1,"//section [2]//tr")测试它时,会出现与 IMPORTHTML 相同的结果.因此,我认为在这种情况下, IMPORTHTML IMPORTXML 可能无法检索完整的表.

I could replicate your issue. When I saw the HTML data, unfortunately, I couldn't find the difference of HTML between the showing rows and the not showing rows. And also, I could confirm that the complete table is included in the HTML data. By the way, when I tested it using =IMPORTXML(A1,"//section[2]//tr"), the same result of IMPORTHTML occurs. So I thought that in this case, IMPORTHTML and IMPORTXML might not be able to retrieve the complete table.

因此,在此答案中,作为一种解决方法,我想提出将使用Sheets API解析的完整表放入.在这种情况下,将使用Google Apps脚本.这样,我可以确认可以通过使用Sheet API解析HTML表来检索完整的表.

So, in this answer, as a workaround, I would like to propose to put the complete table parsed using Sheets API. In this case, Google Apps Script is used. By this, I could confirm that the complete table can be retrieved by parsing the HTML table with Sheet API.

请将以下脚本复制并粘贴到Spreadsheet的脚本编辑器中,然后

Please copy and paste the following script to the script editor of Spreadsheet, and please enable Sheets API at Advanced Google services. And, please run the function of myFunction at the script editor. By this, the retrieved table is put to the sheet of sheetName.

function myFunction() {
  // Please set the following variables.
  const url ="https://finance.yahoo.com/quote/TCOM/options?date=1610668800";
  const sheetName = "Sheet1";  // Please set the destination sheet name.
  const sessionNumber = 2;  // Please set the number of session. In this case, the table of 2nd session is retrieved.

  const html = UrlFetchApp.fetch(url).getContentText();
  const section = [...html.matchAll(/<section[\s\S\w]+?<\/section>/g)];
  if (section.length >= sessionNumber) {
    if (section[sessionNumber].length == 1) {
      const table = section[sessionNumber][0].match(/<table[\s\S\w]+?<\/table>/);
      if (table) {
        const ss = SpreadsheetApp.getActiveSpreadsheet();
        const body = {requests: [{pasteData: {html: true, data: table[0], coordinate: {sheetId: ss.getSheetByName(sheetName).getSheetId()}}}]};
        Sheets.Spreadsheets.batchUpdate(body, ss.getId());
      }
    } else {
      throw new Error("No table.");
    }
  } else {
    throw new Error("No table.");
  }
}

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆