“未找到:表";用于新的bigquery表 [英] "Not found: Table" for new bigquery table

查看:73
本文介绍了“未找到:表";用于新的bigquery表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用python sdk创建一个新的bigquery表:

I use the python sdk to create a new bigquery table:

tableInfo = {
            'tableReference':{
                'datasetId':datasetId,
                'projectId':projectId,
                'tableId':targetTableId
            },
            'schema':schema
        }

result = bigquery_service.tables().insert(projectId=projectId,
                                          datasetId=datasetId,
                                          body=tableInfo).execute()

result变量包含带有etag,id,kind,schema,selfLink,tableReference,type的已创建表信息-因此,我认为该表已正确创建.

The result variable contains the created table information with etag,id,kind,schema,selfLink,tableReference,type - therefore I assume the table is created correctly.

然后,当我打电话给bigquery_service.tables().list(...)

问题是: 在此之后立即插入时,我仍然(经常)收到错误消息:Not found: MY_TABLE_NAME

The problem is: When inserting right after that, I still (often) get an error: Not found: MY_TABLE_NAME

我的插入函数调用如下:

My insert function call looks like this:

response = bigquery_service.tabledata().insertAll(
                        projectId=projectId,
                        datasetId=datasetId,
                        tableId=targetTableId,
                        body=body).execute()

我什至在重试之间有3秒钟的睡眠,尝试了多次插入.有什么想法吗?

I even retried the insert multiple times with 3 seconds of sleep between retries. Any ideas?

我的projectId是stylight-bi-testing

My projectId is stylight-bi-testing

在10:00和12:00(以UTC给出的时间)之间发生了很多故障

There were a lot failures between 10:00 and 12:00 (time given in UTC)

推荐答案

针对您关于使用NOT_FOUND作为指标创建表的问题的答案,这是有意的(尽管有些令人沮丧)行为.

Per your answers to my question regarding using NOT_FOUND as an indicator to create the table, this is intended (though admittedly somewhat frustrating) behavior.

流插入路径缓存有关表的信息(以及用户向表中插入的授权).这是因为该API具有预期的高QPS性质.我们还会缓存某些负面响应,以再次保护有漏洞的用户.那些缓存的否定响应之一是目标表不存在.我们一直在每台计算机上执行此操作,但最近又添加了一个额外的集中式缓存,这样,在返回第一个NOT_FOUND响应后,几乎所有机器都会立即看到负缓存结果.

The streaming insertion path caches information about tables (and the authorization of a user to insert into the table). This is because of the intended high QPS nature of the API. We also cache certain negative responses in order to protect again buggy or abusive clients. One of those cached negative responses is the non-existence of a destination table. We've always done this on a per-machine basis, but recently added an additional centralized cache, such that all machines will see the negative cache result almost immediately after the first NOT_FOUND response is returned.

通常,我们建议不要在插入请求的行内进行表创建,因为在发出数千个QPS插入的系统中,表丢失可能会导致成千上万个表创建操作,这可能会给我们的系统增加负担.相反,如果您事先知道表的可能集合,我们建议您进行一些定期处理,该过程在将表用作流目标之前先执行表创建.如果您的目标表本质上更具动态性,则在执行表创建后可能需要执行延迟.

In general, we recommend that table creation not occur inline with insert requests, because in a system that is issuing thousands of QPS of inserts, a table miss could result in thousands of table creation operations which can be taxing on our system. Instead, if you know the possible set of tables beforehand, we recommend some periodic process that performs table creations in advance of their usage as a streaming destination. If your destination tables are more dynamic in nature, you may need to implement a delay after table creation has been performed.

道歉的困难.我们确实希望解决这个问题,但目前还没有任何时间表.

Apologies for the difficulty. We do hope to address this issue, but we don't have any timeframe yet for doing so.

这篇关于“未找到:表";用于新的bigquery表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆