如何在BigQuery中使用TABLE_QUERY()函数? [英] How do I use the TABLE_QUERY() function in BigQuery?

查看:108
本文介绍了如何在BigQuery中使用TABLE_QUERY()函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有关TABLE_QUERY函数的一些问题:


  • 示例显示了使用 table_id 在查询字符串中,是否还有其他字段可用?

  • 调试似乎很困难。当我尝试使用它时,我收到了评估子查询时出错。 TABLE_QUERY() TABLE_QUERY()


解决方案函数允许您编写一个SQL WHERE 子句,该子句将被评估以查找要运行查询的表。例如,您可以运行以下查询来计算超过7天的 publicdata:samples 数据集中所有表中的行:

  SELECT count(*)
FROM TABLE_QUERY(publicdata:samples,
MSEC_TO_TIMESTAMP(creation_time)<
+ DATE_ADD(CURRENT_TIMESTAMP(),-7,'DAY'))

或者你可以运行这个查询名称中包含'git'的所有表(它们是 github_timeline github_nested 示例表)以及找到最常见的网址:

  SELECT url,COUNT(*)
FROM TABLE_QUERY(publicdata:samples,table_id CONTAINS'git')
GROUP EACH BY URL
ORDER BY url DESC
LIMIT 100

尽管功能非常强大,但可能很难使用 TABLE_QUERY() WHERE 子句必须被指定为一个字符串,这可能有点尴尬。此外,它可能很难调试,因为出现问题时,您只会收到错误错误评估子查询,这并不总是有帮助的。



工作原理:

TABLE_QUERY()本质上执行两个查询。当您运行 TABLE_QUERY(< dataset> ;,< table_query>)时,BigQuery执行 SELECT table_id FROM< dataset> .__ TABLES_SUMMARY__ WHERE< table_query> ; 获取表ID的列表以运行查询,然后它对这些表执行实际查询。



<$ c该查询中的$ c> __ TABLES __ 部分可能看起来不熟悉。 __ TABLES_SUMMARY __ 是一个包含关于数据集中表的信息的元表。你可以自己使用这个元表。例如,查询 SELECT * FROM publicdata:samples .__ TABLES_SUMMARY __ 将返回有关 publicdata:samples 数据集。



可用字段

__TABLES_SUMMARY __ meta-table(这些都可在 TABLE_QUERY 查询中找到)包括:


  • table_id :表名。 creation_time :自1970年1月1日以来以毫秒为单位的时间,该表已创建。这与表中的 creation_time 字段相同。
  • type :它是一个视图(2)还是常规表(1)。


以下字段是 not 可用于 TABLE_QUERY(),因为它们是 __ TABLES __ 的成员,但不是 __ TABLES_SUMMARY __ 。它们被保存在这里以获得历史利益,并且部分记录 __ TABLES __ metatable:


  • last_modified_time :自1970年1月1日以来以毫秒为单位的时间(即元数据或表格内容)。请注意,如果您使用 tabledata.insertAll()将记录流式传输到表中,则可能会过几分钟。

  • row_count :表中的行数。
  • size_bytes


如何调试



为了调试您的 TABLE_QUERY()查询,您可以执行与BigQuery相同的操作;也就是说,您可以自己运行metatable查询。例如:

  SELECT * FROM publicdata:samples .__ TABLES_SUMMARY__ 
WHERE MSEC_TO_TIMESTAMP(creation_time)<
DATE_ADD(CURRENT_TIMESTAMP(),-7,'DAY')

调试您的查询,但也会看到当您运行 TABLE_QUERY 函数时将返回哪些表。一旦你调试了内部查询,你可以把它放在这些表的完整查询中。


A couple of questions about the TABLE_QUERY function:

  • The examples show using table_id in the query string, are there other fields available?
  • It seems difficult to debug. I'm getting "error evaluating subsidiary query" when I try to use it.
  • How does TABLE_QUERY() work?

解决方案

The TABLE_QUERY() function allows you to write a SQL WHERE clause that is evaluated to find which tables to run the query over. For instance, you can run the following query to count the rows in all tables in the publicdata:samples dataset that are older than 7 days:

SELECT count(*)
FROM TABLE_QUERY(publicdata:samples,
    "MSEC_TO_TIMESTAMP(creation_time) < "
    + "DATE_ADD(CURRENT_TIMESTAMP(), -7, 'DAY')")

Or you can run this to query over all tables that have ‘git’ in the name (which are the github_timeline and the github_nested sample tables) and find the most common urls:

SELECT url, COUNT(*)
FROM TABLE_QUERY(publicdata:samples, "table_id CONTAINS 'git'")
GROUP EACH BY url
ORDER BY url DESC
LIMIT 100

Despite being very powerful, TABLE_QUERY() can be difficult to use. The WHERE clause must be specified as a string, which can be a little bit awkward. Moreover, it can be difficult to debug, since when there is a problem, you only get the error "Error evaluating subsidiary query", which isn’t always helpful.

How it works:

TABLE_QUERY() essentially executes two queries. When you run TABLE_QUERY(<dataset>, <table_query>), BigQuery executes SELECT table_id FROM <dataset>.__TABLES_SUMMARY__ WHERE <table_query> to get the list of table IDs to run the query on, then it executes your actual query over those tables.

The __TABLES__ portion of that query may look unfamiliar. __TABLES_SUMMARY__ is a meta-table containing information about tables in a dataset. You can use this meta-table yourself. For example, the query SELECT * FROM publicdata:samples.__TABLES_SUMMARY__ will return metadata about the tables in the publicdata:samples dataset.

Available Fields:

The fields of the __TABLES_SUMMARY__ meta-table (that are all available in the TABLE_QUERY query) include:

  • table_id: name of the table.
  • creation_time: time, in milliseconds since 1/1/1970 UTC, that the table was created. This is the same as the creation_time field on the table.
  • type: whether it is a view (2) or regular table (1).

The following fields are not available in TABLE_QUERY() since they are members of __TABLES__ but not __TABLES_SUMMARY__. They're kept here for historical interest and to partially document the __TABLES__ metatable:

  • last_modified_time: time, in milliseconds since 1/1/1970 UTC, that the table was updated (either metadata or table contents). Note that if you use the tabledata.insertAll() to stream records to your table, this might be a few minutes out of date.
  • row_count: number of rows in the table.
  • size_bytes: total size in bytes of the table.

How to debug

In order to debug your TABLE_QUERY() queries, you can do the same thing that BigQuery does; that is, you can run the the metatable query yourself. For example:

SELECT * FROM publicdata:samples.__TABLES_SUMMARY__ 
WHERE MSEC_TO_TIMESTAMP(creation_time)  < 
   DATE_ADD(CURRENT_TIMESTAMP(), -7, 'DAY')

lets you not only debug your query but also see what tables would be returned when you run the TABLE_QUERY function. Once you have debugged the inner query, you can put it together with your full query over those tables.

这篇关于如何在BigQuery中使用TABLE_QUERY()函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆