错误:TABLE_QUERY 表达式无法查询 BigQuery 表 [英] Error: TABLE_QUERY expressions cannot query BigQuery tables

查看:22
本文介绍了错误:TABLE_QUERY 表达式无法查询 BigQuery 表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是关于 Jordans 的后续问题,请在此处回答:BigQuery 中的奇怪错误

This s a followup question regarding Jordans answer here: Weird error in BigQuery

我曾经在Table_Query"中查询引用表以退出一段时间.现在,根据 Joradan 所指的最近更改,我们的许多查询都被破坏了……我想向社区寻求建议,以寻求我们正在做的事情的替代解决方案.

I was using to query reference table within "Table_Query" for quit some time. Now, following the recent changes Joradan is referring to, many of our queries are broken... I would like to ask the community advice for alternative solution to what we are doing.

我有包含事件的表格(MyTable_YYYYMMDD").我想查询特定(或多个)活动期间的数据.该活动的期间存储在包含所有活动数据(ID、StartCampaignDate、EndCampaignDate)的表中.为了仅查询相关表,我们使用 Table_Query(),并且在 TableQuery() 中,我们根据活动数据构建所有相关表名称的列表.此查询以不同的形式以不同的参数多次运行.使用通配符函数(而不是查询整个数据集)的原因是性能、执行成本和维护成本.因此,让它查询所有表并仅过滤结果不是一种选择,因为它会导致执行成本过高.

I have tables containing events ("MyTable_YYYYMMDD"). I want to query my data for a period of a specific (or several) campaign. The period of that campaign is stored in a table with all campaigns data (ID, StartCampaignDate, EndCampaignDate). In order to query only the relevant tables, we use Table_Query(), and within the TableQuery() we construct a list of all relevant table names based on the campaigns data. This query runs in various forms many times with different params. the reason for using wildcard function (rather than query the entire dataset), is performance, execution costs, and maintenance costs. So, having it query all tables and filter just the results is not an option as it drives execution costs too high.

示例查询将如下所示:

SELECT
  *
FROM
  TABLE_QUERY([MyProject:MyDataSet] 'table_id IN  
  (SELECT CONCAT("MyTable_",STRING(Year*100+Month)) TBL_NAME  
    FROM DWH.Dim_Periods P  
    CROSS JOIN DWH.Campaigns AS LC  
    WHERE ID IN ("86254e5a-b856-3b5a-85e1-0f5ab3ff20d6") 
    AND DATE(P.Date) BETWEEN DATE(StartCampaignDate) AND DATE(EndCampaignDate))')

现在坏了...我的问题 - 您应该查询哪些表的信息存储在引用表中,当TableQuery"不再允许查询引用表时,您将如何仅查询相关表(分区)?

This is now broken... My question - the info, which tables should you query is stored on a reference table, How would you query only the relevant tables (partitions) when "TableQuery" is no longer allowed to query reference tables?

非常感谢

推荐答案

我看到的简单"方式是将其拆分为两个步骤
第 1 步 - 构建将用于过滤 table_id 的列表

The "simple" way I see is split it to two steps
Step 1 - build list that will be used to filter table_id's

SELECT GROUP_CONCAT_UNQUOTED(
                   CONCAT('"',"MyTable_",STRING(Year*100+Month),'"')
       ) TBL_NAME_LIST 
FROM DWH.Dim_Periods P  
CROSS JOIN DWH.Campaigns AS LC  
WHERE ID IN ("86254e5a-b856-3b5a-85e1-0f5ab3ff20d6") 
AND DATE(P.Date) BETWEEN DATE(StartCampaignDate) AND DATE(EndCampaignDate)

注意查询中的更改,将结果转换为您将在步骤 2 中使用的列表

Note the change in your query to transform result to list that you will use in step 2

第 2 步 - 最终查询

Step 2 - final query

SELECT
  *
FROM
  TABLE_QUERY([MyProject:MyDataSet], 
              'table_id IN (<paste list (TBL_NAME_LIST) built in first query>)')

上述步骤很容易在您可能使用的任何客户端中实施
如果您在 BigQuery Web UI 中使用它 - 这会让您做一些额外的手动移动",您可能对此不满意

Above steps are easy to implement in any client you potentially using
If you use it from within BigQuery Web UI - this makes you do a little extra manual "moves" that you might not be happy about

我的回答很明显,你很可能已经有了这个选项,但想提一下

My answer is obvious and you most likely have this already as an option, but wanted to mention

这篇关于错误:TABLE_QUERY 表达式无法查询 BigQuery 表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆