将大量数据从Python加载到Google BigQuery中 [英] Loading a Lot of Data into Google Bigquery from Python

查看:150
本文介绍了将大量数据从Python加载到Google BigQuery中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我现在一直在努力将大块数据加载到bigquery中。在Google的文档中,我看到 insertAll 方法,这似乎工作正常,但当我尝试通过约100k的JSON数据发送任何内容时,会出现413实体太大的错误。 根据Google的文档,我应该能够在JSON中发送最多1TB的未压缩数据。是什么赋予了?上一页中的示例让我手动构建请求主体,而不是使用insertAll,这更丑陋,更容易出错。我也不确定在这种情况下数据应该采用何种格式。

因此,所有这些都说明了,将大量数据加载到Bigquery中的干净/正确的方式是什么?数据的一个例子会很棒。如果可能的话,我真的不想自己构建请求体。注意,对于将数据流式传输到BQ, 超过10k行/秒的任何内容需要与销售代表交谈。



如果您想直接向BQ发送大块,可以通过 POST发送 。如果您使用的是客户端库,则应该处理使上传可供您使用的功能。为此,您需要致电 jobs.insert() 而不是 tabledata.insertAll() ,并提供 加载工作。要使用Python客户端实际推送字节,您可以创建 MediaFileUpload MediaInMemoryUpload 并将其作为 media_body 传递c>参数。



另一个选项是在Google云端存储和从那里加载


I've been struggling to load big chunks of data into bigquery for a little while now. In Google's docs, I see the insertAll method, which seems to work fine, but gives me 413 "Entity too large" errors when I try to send anything over about 100k of data in JSON. Per Google's docs, I should be able to send up to 1TB of uncompressed data in JSON. What gives? The example on the previous page has me building the request body manually instead of using insertAll, which is uglier and more error prone. I'm also not sure what format the data should be in in that case.

So, all of that said, what is the clean/proper way of loading lots of data into Bigquery? An example with data would be great. If at all possible, I'd really rather not build the request body myself.

解决方案

Note that for streaming data to BQ, anything above 10k rows/sec requires talking to a sales rep.

If you'd like to send large chunks directly to BQ, you can send it via POST. If you're using a client library, it should handle making the upload resumable for you. To do this, you'll need to make a call to jobs.insert() instead of tabledata.insertAll(), and provide a description of a load job. To actually push the bytes using the Python client, you can create a MediaFileUpload or MediaInMemoryUpload and pass it as the media_body parameter.

The other option is to stage the data in Google Cloud Storage and load it from there.

这篇关于将大量数据从Python加载到Google BigQuery中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆