如何针对具有记录字段的表创建视图? [英] How to create a view against a table that has record fields?

查看:116
本文介绍了如何针对具有记录字段的表创建视图?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们每周都有一个备份流程,将我们的Google Appengine数据存储产品导出到Google云端存储,然后导入Google BigQuery。每周,我们创建一个名为 YYYY_MM_DD 的新数据集,其中包含当天的生产表副本。随着时间的推移,我们收集了许多数据集,例如 2014_05_10 2014_05_17 等。我想创建一个数据集 Latest_Production_Data ,它包含最近的 YYYY_MM_DD 数据集中每个表的视图。这将使下游报表更容易编写一次查询,并始终检索最近的数据。



为此,我使用获取最新数据集的代码和数据集包含在BigQuery API中的所有表的名称。然后,对于每个表格,我都会启动一个,但我不希望重复数据,如果我完全可以避免它。

div>

这是我编写的用于动态生成解决方法代码 > SELECT 语句为每个表:

  def get_leaf_column_selectors(dataset,table):
schema = table_service.get(
projectId = BQ_PROJECT_ID,
datasetId = dataset,
tableId = table
).execute()['schema']

return,\\\
.join([
_get_leaf_selectors(,top_field)
for schema [fields]
])


def _get_leaf_selectors(前缀,字段):
如果前缀:
format = prefix +。%s
else:
format =%s

如果'fields'不在字段中:
#基本情况
实际名称=格式%字段[名称]
safe_name = actual_name.replace(。,_)
返回%s作为%s%(actual_name,safe_name)
其他:
#递归案例
返回,\\\
.join([
_get_leaf_selectors(格式%field [name],sub_field)
用于字段[ fields]
])


We have a weekly backup process which exports our production Google Appengine Datastore onto Google Cloud Storage, and then into Google BigQuery. Each week, we create a new dataset named like YYYY_MM_DD that contains a copy of the production tables on that day. Over time, we have collected many datasets, like 2014_05_10, 2014_05_17, etc. I want to create a data set Latest_Production_Data that contains a view for each of the tables in the most recent YYYY_MM_DD dataset. This will make it easier for downstream reports to write their query once and always retrieve the most recent data.

To do this, I have code that gets the most recent dataset and the names of all the tables that dataset contains from the BigQuery API. Then, for each of these tables, I fire a tables.insert call to create a view that is a SELECT * from the table I am looking to create a reference to.

This fails for tables that contain a RECORD field, from what looks to be a pretty benign column-naming rule.

For example, I have this table:

For which I issue this API call:

{
  'tableReference': {
    'projectId': 'redacted',
    'tableId': u'AccountDeletionRequest',
    'datasetId': 'Latest_Production_Data'
  }
  'view': {
    'query': u'SELECT * FROM [2014_05_17.AccountDeletionRequest]'
  },
}

This results in the following error:

HttpError: https://www.googleapis.com/bigquery/v2/projects//datasets/Latest_Production_Data/tables?alt=json returned "Invalid field name "__key__.namespace". Fields must contain only letters, numbers, and underscores, start with a letter or underscore, and be at most 128 characters long.">

When I execute this query in the BigQuery web console, the columns are renamed to translate the . to an _. I kind of expected the same thing to happen when I issued the create view API call.

Is there an easy way I can programmatically create a view for each of the tables in my dataset, regardless of their underlying schema? The problem I'm encountering now is for record columns, but another problem I anticipate is for tables that have repeated fields. Is there some magic alternative to SELECT * that will take care of all these intricacies for me?

Another idea I had was doing a table copy, but I would prefer not to duplicate the data if I can at all avoid it.

解决方案

Here is the workaround code I wrote to dynamically generate a SELECT statement for each of the tables:

def get_leaf_column_selectors(dataset, table):
    schema = table_service.get(
            projectId=BQ_PROJECT_ID,
            datasetId=dataset,
            tableId=table
        ).execute()['schema']

    return ",\n".join([
        _get_leaf_selectors("", top_field)
        for top_field in schema["fields"]
    ])


def _get_leaf_selectors(prefix, field):
    if prefix:
        format = prefix + ".%s"
    else:
        format = "%s"

    if 'fields' not in field:
        # Base case
        actual_name = format % field["name"]
        safe_name = actual_name.replace(".", "_")
        return "%s as %s" % (actual_name, safe_name)
    else:
        # Recursive case
        return ",\n".join([
            _get_leaf_selectors(format % field["name"], sub_field)
            for sub_field in field["fields"]
        ])

这篇关于如何针对具有记录字段的表创建视图?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆