如何在BigQuery中使用TABLE_QUERY()函数? [英] How do I use the TABLE_QUERY() function in BigQuery?
问题描述
有关TABLE_QUERY函数的一些问题:
table_id
在查询字符串中,是否还有其他字段可用?
TABLE_QUERY()
TABLE_QUERY()
WHERE
子句,该子句将被评估以查找要运行查询的表。例如,您可以运行以下查询来计算超过7天的 publicdata:samples
数据集中所有表中的行: SELECT count(*)
FROM TABLE_QUERY(publicdata:samples,
MSEC_TO_TIMESTAMP(creation_time)<
+ DATE_ADD(CURRENT_TIMESTAMP(),-7,'DAY'))
或者你可以运行这个查询名称中包含'git'的所有表(它们是 github_timeline
和 github_nested
示例表)以及找到最常见的网址:
SELECT url,COUNT(*)
FROM TABLE_QUERY(publicdata:samples,table_id CONTAINS'git')
GROUP EACH BY URL
ORDER BY url DESC
LIMIT 100
尽管功能非常强大,但可能很难使用 TABLE_QUERY()
。 WHERE
子句必须被指定为一个字符串,这可能有点尴尬。此外,它可能很难调试,因为出现问题时,您只会收到错误错误评估子查询,这并不总是有帮助的。
工作原理:
TABLE_QUERY()
本质上执行两个查询。当您运行 TABLE_QUERY(< dataset> ;,< table_query>)
时,BigQuery执行 SELECT table_id FROM< dataset> .__ TABLES_SUMMARY__ WHERE< table_query> ;
获取表ID的列表以运行查询,然后它对这些表执行实际查询。
<$ c该查询中的$ c> __ TABLES __ 部分可能看起来不熟悉。 __ TABLES_SUMMARY __
是一个包含关于数据集中表的信息的元表。你可以自己使用这个元表。例如,查询 SELECT * FROM publicdata:samples .__ TABLES_SUMMARY __
将返回有关 publicdata:samples
数据集。
可用字段 以下字段是 not 可用于 如何调试 为了调试您的 调试您的查询,但也会看到当您运行 A couple of questions about the TABLE_QUERY function:
__TABLES_SUMMARY __
meta-table(这些都可在 TABLE_QUERY
查询中找到)包括:
table_id
:表名。 creation_time
字段相同。 type
:它是一个视图(2)还是常规表(1)。
TABLE_QUERY()
,因为它们是 __ TABLES __
的成员,但不是 __ TABLES_SUMMARY __
。它们被保存在这里以获得历史利益,并且部分记录 __ TABLES __
metatable:
last_modified_time
:自1970年1月1日以来以毫秒为单位的时间(即元数据或表格内容)。请注意,如果您使用 tabledata.insertAll()
将记录流式传输到表中,则可能会过几分钟。
row_count
:表中的行数。
size_bytes
:
TABLE_QUERY()
查询,您可以执行与BigQuery相同的操作;也就是说,您可以自己运行metatable查询。例如:
SELECT * FROM publicdata:samples .__ TABLES_SUMMARY__
WHERE MSEC_TO_TIMESTAMP(creation_time)<
DATE_ADD(CURRENT_TIMESTAMP(),-7,'DAY')
TABLE_QUERY
函数时将返回哪些表。一旦你调试了内部查询,你可以把它放在这些表的完整查询中。
table_id
in the query string, are there other fields available?TABLE_QUERY()
work?
The TABLE_QUERY()
function allows you to write a SQL WHERE
clause that is evaluated to find which tables to run the query over. For instance, you can run the following query to count the rows in all tables in the publicdata:samples
dataset that are older than 7 days:
SELECT count(*)
FROM TABLE_QUERY(publicdata:samples,
"MSEC_TO_TIMESTAMP(creation_time) < "
+ "DATE_ADD(CURRENT_TIMESTAMP(), -7, 'DAY')")
Or you can run this to query over all tables that have ‘git’ in the name (which are the github_timeline
and the github_nested
sample tables) and find the most common urls:
SELECT url, COUNT(*)
FROM TABLE_QUERY(publicdata:samples, "table_id CONTAINS 'git'")
GROUP EACH BY url
ORDER BY url DESC
LIMIT 100
Despite being very powerful, TABLE_QUERY()
can be difficult to use. The WHERE
clause must be specified as a string, which can be a little bit awkward. Moreover, it can be difficult to debug, since when there is a problem, you only get the error "Error evaluating subsidiary query", which isn’t always helpful.
How it works:
TABLE_QUERY()
essentially executes two queries. When you run TABLE_QUERY(<dataset>, <table_query>)
, BigQuery executes SELECT table_id FROM <dataset>.__TABLES_SUMMARY__ WHERE <table_query>
to get the list of table IDs to run the query on, then it executes your actual query over those tables.
The __TABLES__
portion of that query may look unfamiliar. __TABLES_SUMMARY__
is a meta-table containing information about tables in a dataset. You can use this meta-table yourself. For example, the query SELECT * FROM publicdata:samples.__TABLES_SUMMARY__
will return metadata about the tables in the publicdata:samples
dataset.
Available Fields:
The fields of the __TABLES_SUMMARY__
meta-table (that are all available in the TABLE_QUERY
query) include:
table_id
: name of the table.creation_time
: time, in milliseconds since 1/1/1970 UTC, that the table was created. This is the same as thecreation_time
field on the table.type
: whether it is a view (2) or regular table (1).
The following fields are not available in TABLE_QUERY()
since they are members of __TABLES__
but not __TABLES_SUMMARY__
. They're kept here for historical interest and to partially document the __TABLES__
metatable:
last_modified_time
: time, in milliseconds since 1/1/1970 UTC, that the table was updated (either metadata or table contents). Note that if you use thetabledata.insertAll()
to stream records to your table, this might be a few minutes out of date.row_count
: number of rows in the table.size_bytes
: total size in bytes of the table.
How to debug
In order to debug your TABLE_QUERY()
queries, you can do the same thing that BigQuery does; that is, you can run the the metatable query yourself. For example:
SELECT * FROM publicdata:samples.__TABLES_SUMMARY__
WHERE MSEC_TO_TIMESTAMP(creation_time) <
DATE_ADD(CURRENT_TIMESTAMP(), -7, 'DAY')
lets you not only debug your query but also see what tables would be returned when you run the TABLE_QUERY
function. Once you have debugged the inner query, you can put it together with your full query over those tables.
这篇关于如何在BigQuery中使用TABLE_QUERY()函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!