将Bigquery数据重定向到预测 [英] Redirect Bigquery Data to Prediction

查看:133
本文介绍了将Bigquery数据重定向到预测的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在Google电子表格中开发一个POC。



有一些配置,但简而言之,它会从BigQuery下载数据并将其重定向到Prediction。



我们的Bigquery表格已超过41Mb,并且不受电子表格支持/支持。

我们在下载来自Bigquery的5Mb数据包时想过。
虽然Predicition API提供了 插入 大量数据, 更新 方法允许只上传一行/实例。



有没有办法将Bigquery数据直接重定向到预测?



涉及的三个主要功能是:

/ ** *这个函数从Bigquery下载数据*对于给定的表,它接收应该下载的列* * @params * - modelName //新的预测模型名称* - 列//列的数组* *表的表名* - 数据集//数据集名称* / function createPrediction(modelName,columns,table,dataset){try {var rows = simpleQuery(columns,table,dataset); var rowsL = rows.length; var trainingL = parseInt(0.9 * rowsL,10); var training_instances = []; for(var i = 0; i< trainingL; i ++){training_instances.push({'output':rows [i] .f [0] .v,'csvInstance':rows [i] .f [1]。 v}); } var resource = {'id':modelName,'trainingInstances':training_instances}; var insert_reply = Prediction.Trainedmodels.insert(resource,c_projectId); c_modelName = modelName; Browser.msgBox('Insert reply:'+ insert_reply,Browser.Buttons.OK); } catch(err){Browser.msgBox('ERROR:'+ err,Browser.Buttons.OK); }} / ** *这个函数应该接收一个数据集名,一个表名和一些列名*它应该返回SELECT [COLUMNS] FROM [DATASET.TABLE]* * @params:* - * /函数simpleQuery(列,表,数据集,projectId){projectId = projectId || c_projectId;返回getQuery(SELECT+ columns.join(,)+FROM [+ dataset +。+ table +],projectId);} / ** *此函数应该返回给定的结果查询* * @params:* - 查询//待评估的查询* - projectId // Google的项目ID * / function getQuery(query,projectId){var request = {query:query}; var queryResults = BigQuery.Jobs.query(request,projectId); var jobId = queryResults.jobReference.jobId; //检查查询作业的状态。 var sleepTimeMs = 500; while(!queryResults.jobComplete){Utilities.sleep(sleepTimeMs); sleepTimeMs * = 2; queryResults = BigQuery.Jobs.getQueryResults(projectId,jobId); } //获取结果的所有行。 var rows = queryResults.rows; while(queryResults.pageToken){queryResults = BigQuery.Jobs.getQueryResults(projectId,jobId,{pageToken:queryResults.pageToken}); rows = rows.concat(queryResults.rows); } return rows;}



Regards。

解决方案

目前BigQuery和Prediction Api之间没有直接联系,但都与Google云端存储集成。



为训练新模型,您可以将BigQuery数据作为CSV文件导出到GCS,然后将该文件插入到Prediction Api中。



由于Prediction api支持对现有模型的单行插入,所以您将不得不做更多的工作,但BigQuery仅向GCS提供批量导出或通过 bigquery.tabledata提供批量读取。列表。您需要进行一些处理才能从BigQuery中提取数据,并将其一次插入到您的预测模型中。



使用Google Apps脚本从Google内部访问BigQuery电子表格打开了许多简单的集成选项,但也存在由Apps脚本强加的处理限制,例如最大文件上传大小......您可能会发现其他框架更灵活。 云数据流是否适合您?如何 Google App Engine


We are developing a POC in Google's Spreadsheets.

There are some configurations, but in a nutshell it downloads data from BigQuery and redirects it to Prediction.

Our Bigquery tables have over to 41Mb, with is not allowed/supported by Spreadsheets.

We thought in download packages of 5Mb of data from Bigquery. Although Predicition API provides methods for insert lots of data, the update method allows to upload only one line/instance.

Is there any way to redirect Bigquery data straight to Prediction?

The main three functions involved are:

/**
 * this function downloads data from Bigquery
 * for a given table, it receives the columns supposed to be downloaded
 *
 * @params
 * - modelName // the new prediction model name
 * - columns // an array of columns
 * - table // the table name
 * - dataset // the dataset name
 */
function createPrediction(modelName, columns, table, dataset) {
  try {
    var rows = simpleQuery(columns, table, dataset);
    var rowsL = rows.length;
    var trainingL = parseInt(0.9 * rowsL, 10);
    var training_instances = [];
    
    for (var i = 0; i < trainingL; i++) {
      training_instances.push({'output': rows[i].f[0].v, 'csvInstance': rows[i].f[1].v});
    }
    
    var resource = {'id': modelName, 'trainingInstances': training_instances};
    var insert_reply = Prediction.Trainedmodels.insert(resource, c_projectId);
    
    c_modelName = modelName;
    
    Browser.msgBox('Insert reply:' + insert_reply, Browser.Buttons.OK);
  } catch (err) {
    Browser.msgBox('ERROR: ' + err, Browser.Buttons.OK);
  }
}

/**
 * this function should receive a dataset name, a table name and some columns' names
 * it is supposed to return the "SELECT [COLUMNS] FROM [DATASET.TABLE]"
 *
 * @params:
 * -
 */
function simpleQuery(columns, table, dataset, projectId) {
  projectId = projectId || c_projectId;
  return getQuery("SELECT " + columns.join(",") + " FROM [" + dataset + "." + table + "]", projectId);
}

/**
 * this function should return the result for a given query
 *
 * @params:
 * - query // the query to be evaluated
 * - projectId // the googles's project id
 */
function getQuery(query, projectId) {
  var request = {
    query: query
  };
  var queryResults = BigQuery.Jobs.query(request, projectId);
  var jobId = queryResults.jobReference.jobId;
  // Check on status of the Query Job.
  var sleepTimeMs = 500;
  while (!queryResults.jobComplete) {
    Utilities.sleep(sleepTimeMs);
    sleepTimeMs *= 2;
    queryResults = BigQuery.Jobs.getQueryResults(projectId, jobId);
  }
  // Get all the rows of results.
  var rows = queryResults.rows;
  while (queryResults.pageToken) {
    queryResults = BigQuery.Jobs.getQueryResults(projectId, jobId, {
      pageToken: queryResults.pageToken
    });
    rows = rows.concat(queryResults.rows);
  }
  return rows;
}

Regards.

解决方案

There is no direct connection between BigQuery and the Prediction Api at this time, but both integrate with Google Cloud Storage.

For training new models, you can export BigQuery data to GCS as a CSV file, then insert that file into the Prediction Api.

For updating existing models from BigQuery data, you are going to have to do more work since the Prediction api supports single-row inserts to existing models, but BigQuery only provides bulk export to GCS or ranged reads via bigquery.tabledata.list. You will need some processing to extract the data from BigQuery and insert it one row at a time to your Prediction model.

Using Google Apps Script to access BigQuery from within a Google Spreadsheet opens up a lot of easy integration options, but there are also processing limits imposed by Apps Script, like maximum file upload sizes... You might find other frameworks more flexible. Would Cloud Dataflow work for you? How about Google App Engine?

这篇关于将Bigquery数据重定向到预测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆