BigQuery:此表的表dml插入操作过多 [英] BigQuery: too many table dml insert operations for this table

查看:59
本文介绍了BigQuery:此表的表dml插入操作过多的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正尝试通过Python客户端将不同计算机(n = 20)上的2亿多条记录导入到BigQuery表中.每台计算机每隔10秒运行一次作业(有多行)

I'm trying to import more than 200M records on different computers (n=20) to my BigQuery table via Python client. Each computer runs every 10. second a job (with multiple rows)

from google.cloud import bigquery
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = os.getcwd() + '/api.json'

print('Import job started, total rows:' + str(len(queries)))
client = bigquery.Client()
for q in queries:
    results = client.query(q)
    for err in results:
        print(err)

但是我遇到了以下错误:

But I'm getting the following error:

google.api_core.exceptions.Forbidden:403超出了速率限制:也该表有许多表dml插入操作.想要查询更多的信息,请参见 https://cloud.google.com/bigquery/troubleshooting-errors

正在运行时生成数据.因此,我必须在运行时导入数据.我也不确定BigQuery是否适合这样做.Spanner似乎更好,但花了我太多钱.

The data are being generated on run-time. So I have to import the data on run-time. I'm also not sure if BigQuery is good for that. Spanner seems to be better but it costs me too much.

如何避免此错误?非常感谢.

How can I avoid this error? Thank you very much.

推荐答案

有4种主要方法可将数据插入BigQuery表.

There are 4 major ways to insert data into BigQuery tables.

  1. 批量加载一组数据记录.
  2. 流式传输单个记录或记录批次.
  3. 使用查询生成新数据,并将结果追加或覆盖到表中.
  4. 使用第三方应用程序或服务.

我认为您正在使用第三个选项,即DML INSERT.它不是为大规模高频数据加载用例设计的.

I think you are using the 3rd option, which is DML INSERT. It's not designed for large-scale high-frequency data loading use case.

在您的用例中,似乎第二种选择是流数据,可能很合适.

In your use case, it seems the 2nd option, streaming data, could be a good fit.

示例

from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

# TODO(developer): Set table_id to the ID of table to append to.
# table_id = "your-project.your_dataset.your_table"

rows_to_insert = [
    {u"full_name": u"Phred Phlyntstone", u"age": 32},
    {u"full_name": u"Wylma Phlyntstone", u"age": 29},
]

errors = client.insert_rows_json(table_id, rows_to_insert)  # Make an API request.
if errors == []:
    print("New rows have been added.")
else:
    print("Encountered errors while inserting rows: {}".format(errors))

您可以在此处查看更多详细信息. https://cloud.google.com/bigquery/streaming-data-into-bigquery

You could see more details here. https://cloud.google.com/bigquery/streaming-data-into-bigquery

这篇关于BigQuery:此表的表dml插入操作过多的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆