BigQuery - 检查表已存在 [英] BigQuery - Check if table already exists

查看:179
本文介绍了BigQuery - 检查表已存在的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在BigQuery中有一个数据集。此数据集包含多个表。



我以编程方式使用BigQuery API执行以下步骤:


  1. 查询数据集中的表 - 由于我的响应太大,我启用了allowLargeResults参数并将我的响应转移到目标表。


  2. 我是然后将数据从目的地表导出到GCS桶。


要求:




  • 假设我的进程在步骤2中失败,我想重新运行此步骤。


  • 但是,在我重新运行之前,我想检查/验证数据集中是否存在名为xyz的特定目标表。


  • 如果存在,我想重新运行第2步。


  • 如果不存在,我想做foo。




我该怎么做?

提前感谢

解决方案

这是一个python片段,表存在:

  def doesTableExist(project_id,dataset_id,table_id):
bq.tables()。delete b $ b projectId = project_id,
datasetId = dataset_id,
tableId = table_id).execute()
return False

或者,如果您不想删除该过程中的表,您可以尝试:

  def doesTableExist(project_id,dataset_id,table_id):
try:
bq.tables()。get(
projectId = project_id,
datasetId = dataset_id,
tableId = table_id).execute()
返回True
除了HttpError,err
如果err.resp.statu s - 404:
raise
return False

如果你想知道哪里 bq 来自,可以从这里调用 build_bq_client http://code.google.com/p/bigquery-e2e/source/browse/samples/ch12/auth.py



一般来说,如果您使用这个来测试是否应该运行一个修改表的工作,那么这样做最好还是要做,并使用 WRITE_TRUNCATE 作为写入配置。



另一种方法可以是创建可预测的作业ID,然后重试该职位与该ID。如果作业已经存在,则该作业已经运行(但是您可能需要重新检查以确保作业没有失败)。


I have a dataset in BigQuery. This dataset contains multiple tables.

I am doing the following steps programmatically using the BigQuery API:

  1. Querying the tables in the dataset - Since my response is too large, I am enabling allowLargeResults parameter and diverting my response to a destination table.

  2. I am then exporting the data from the destination table to a GCS bucket.

Requirements:

  • Suppose my process fails at Step 2, I would like to re-run this step.

  • But before I re-run, I would like to check/verify that the specific destination table named 'xyz' already exists in the dataset.

  • If it exists, I would like to re-run step 2.

  • If it does not exist, I would like to do foo.

How can I do this?

Thanks in advance.

解决方案

Here is a python snippet that will tell whether a table exists:

def doesTableExist(project_id, dataset_id, table_id):
  bq.tables().delete(
      projectId=project_id, 
      datasetId=dataset_id,
      tableId=table_id).execute()
  return False

Alternately, if you'd prefer not deleting the table in the process, you could try:

def doesTableExist(project_id, dataset_id, table_id):
  try:
    bq.tables().get(
        projectId=project_id, 
        datasetId=dataset_id,
        tableId=table_id).execute()
    return True
  except HttpError, err
    if err.resp.status <> 404:
       raise
    return False

If you want to know where bq came from, you can call build_bq_client from here: http://code.google.com/p/bigquery-e2e/source/browse/samples/ch12/auth.py

In general, if you're using this to test whether you should run a job that will modify the table, it can be a good idea to just do the job anyway, and use WRITE_TRUNCATE as a write disposition.

Another approach can be to create a predictable job id, and retry the job with that id. If the job already exists, the job already ran (you might want to double check to make sure the job didn't fail, however).

这篇关于BigQuery - 检查表已存在的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆