BigQuery:此表的表dml插入操作过多 [英] BigQuery: too many table dml insert operations for this table
问题描述
我正尝试通过Python客户端将不同计算机(n = 20)上的2亿多条记录导入到BigQuery表中.每台计算机每隔10秒运行一次作业(有多行)
I'm trying to import more than 200M records on different computers (n=20) to my BigQuery table via Python client. Each computer runs every 10. second a job (with multiple rows)
from google.cloud import bigquery
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = os.getcwd() + '/api.json'
print('Import job started, total rows:' + str(len(queries)))
client = bigquery.Client()
for q in queries:
results = client.query(q)
for err in results:
print(err)
但是我遇到了以下错误:
But I'm getting the following error:
google.api_core.exceptions.Forbidden:403超出了速率限制:也该表有许多表dml插入操作.想要查询更多的信息,请参见 https://cloud.google.com/bigquery/troubleshooting-errors >
正在运行时生成数据.因此,我必须在运行时导入数据.我也不确定BigQuery是否适合这样做.Spanner似乎更好,但花了我太多钱.
The data are being generated on run-time. So I have to import the data on run-time. I'm also not sure if BigQuery is good for that. Spanner seems to be better but it costs me too much.
如何避免此错误?非常感谢.
How can I avoid this error? Thank you very much.
推荐答案
有4种主要方法可将数据插入BigQuery表.
There are 4 major ways to insert data into BigQuery tables.
- 批量加载一组数据记录.
- 流式传输单个记录或记录批次.
- 使用查询生成新数据,并将结果追加或覆盖到表中.
- 使用第三方应用程序或服务.
我认为您正在使用第三个选项,即DML INSERT.它不是为大规模高频数据加载用例设计的.
I think you are using the 3rd option, which is DML INSERT. It's not designed for large-scale high-frequency data loading use case.
在您的用例中,似乎第二种选择是流数据,可能很合适.
In your use case, it seems the 2nd option, streaming data, could be a good fit.
示例
from google.cloud import bigquery
# Construct a BigQuery client object.
client = bigquery.Client()
# TODO(developer): Set table_id to the ID of table to append to.
# table_id = "your-project.your_dataset.your_table"
rows_to_insert = [
{u"full_name": u"Phred Phlyntstone", u"age": 32},
{u"full_name": u"Wylma Phlyntstone", u"age": 29},
]
errors = client.insert_rows_json(table_id, rows_to_insert) # Make an API request.
if errors == []:
print("New rows have been added.")
else:
print("Encountered errors while inserting rows: {}".format(errors))
您可以在此处查看更多详细信息. https://cloud.google.com/bigquery/streaming-data-into-bigquery
You could see more details here. https://cloud.google.com/bigquery/streaming-data-into-bigquery
这篇关于BigQuery:此表的表dml插入操作过多的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!