如何从BigQuery将大型数据集加载到R? [英] How to load large datasets to R from BigQuery?

查看:95
本文介绍了如何从BigQuery将大型数据集加载到R?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用 Bigrquery 包尝试了两种方法那

I have tried two ways with Bigrquery package such that

library(bigrquery)
library(DBI)

con <- dbConnect(
  bigrquery::bigquery(),
  project = "YOUR PROJECT ID HERE",
  dataset = "YOUR DATASET"
)
test<- dbGetQuery(con, sql, n = 10000, max_pages = Inf)

sql <- `YOUR LARGE QUERY HERE` #long query saved to View and its select here
tb <- bigrquery::bq_project_query(project, sql)
bq_table_download(tb, max_results = 1000)

但未出现错误"Error: Requested Resource Too Large to Return [responseTooLarge]",可能是相关的问题此处 ,但我对完成任务的任何工具都感兴趣:我已经尝试过

but failing to the error "Error: Requested Resource Too Large to Return [responseTooLarge]", potentially related issue here, but I am interested in any tool to get the job done: I tried already the solutions outlined here but they failed.

如何从BigQuery将大型数据集加载到R?

推荐答案

根据@hrbrmstr的建议,

As @hrbrmstr kind of suggested you, the documentation mentions specifically:

> #' @param page_size The number of rows returned per page. Make this smaller
> #'   if you have many fields or large records and you are seeing a
> #'   'responseTooLarge' error.

在r-project.org的本文档中,您将在

In this documentation from r-project.org you will read a different advise in the explanation of this function (page 13):

这将检索page_size块中的行.最适合 较小查询的结果(例如,小于100 MB).对于较大的查询,它是 更好地将结果导出到存储在Google云中的CSV文件中,并且 使用bq命令行工具在本地下载.

This retrieves rows in chunks of page_size. It is most suitable for results of smaller queries (<100 MB, say). For larger queries, it is better to export the results to a CSV file stored on google cloud and use the bq command line tool to download locally.

这篇关于如何从BigQuery将大型数据集加载到R?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆