如何在 BigQuery 中使用 TABLE_QUERY() 函数? [英] How do I use the TABLE_QUERY() function in BigQuery?

查看:14
本文介绍了如何在 BigQuery 中使用 TABLE_QUERY() 函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

关于 TABLE_QUERY 函数的几个问题:

A couple of questions about the TABLE_QUERY function:

  • 示例显示在查询字符串中使用 table_id,是否还有其他字段可用?
  • 似乎很难调试.当我尝试使用它时,我收到错误评估附属查询".
  • TABLE_QUERY() 如何工作?
  • The examples show using table_id in the query string, are there other fields available?
  • It seems difficult to debug. I'm getting "error evaluating subsidiary query" when I try to use it.
  • How does TABLE_QUERY() work?

推荐答案

TABLE_QUERY() 函数允许您编写一个 SQL WHERE 子句,该子句被评估以找出哪个运行查询的表.例如,您可以运行以下查询来计算 publicdata:samples 数据集中所有表中超过 7 天的行数:

The TABLE_QUERY() function allows you to write a SQL WHERE clause that is evaluated to find which tables to run the query over. For instance, you can run the following query to count the rows in all tables in the publicdata:samples dataset that are older than 7 days:

SELECT count(*)
FROM TABLE_QUERY(publicdata:samples,
    "MSEC_TO_TIMESTAMP(creation_time) < "
    + "DATE_ADD(CURRENT_TIMESTAMP(), -7, 'DAY')")

或者您可以运行它来查询名称中包含 'git' 的所有表(它们是 github_timelinegithub_nested 示例表)并找到最多的常用网址:

Or you can run this to query over all tables that have ‘git’ in the name (which are the github_timeline and the github_nested sample tables) and find the most common urls:

SELECT url, COUNT(*)
FROM TABLE_QUERY(publicdata:samples, "table_id CONTAINS 'git'")
GROUP EACH BY url
ORDER BY url DESC
LIMIT 100

尽管非常强大,TABLE_QUERY() 可能难以使用.WHERE 子句必须指定为字符串,这可能有点尴尬.此外,调试可能很困难,因为当出现问题时,您只会收到错误评估附属查询时出错",这并不总是有帮助.

Despite being very powerful, TABLE_QUERY() can be difficult to use. The WHERE clause must be specified as a string, which can be a little bit awkward. Moreover, it can be difficult to debug, since when there is a problem, you only get the error "Error evaluating subsidiary query", which isn’t always helpful.

工作原理:

TABLE_QUERY() 本质上执行两个查询.当您运行 TABLE_QUERY(, ) 时,BigQuery 执行 SELECT table_id FROM .__TABLES_SUMMARY__ WHERE 以获取列表运行查询的表 ID,然后它会在这些表上执行您的实际查询.

TABLE_QUERY() essentially executes two queries. When you run TABLE_QUERY(<dataset>, <table_query>), BigQuery executes SELECT table_id FROM <dataset>.__TABLES_SUMMARY__ WHERE <table_query> to get the list of table IDs to run the query on, then it executes your actual query over those tables.

该查询的 __TABLES__ 部分可能看起来不熟悉.__TABLES_SUMMARY__ 是一个元表,包含有关数据集中表的信息.你可以自己使用这个元表.例如,查询 SELECT * FROM publicdata:samples.__TABLES_SUMMARY__ 将返回有关 publicdata:samples 数据集中表的元数据.

The __TABLES__ portion of that query may look unfamiliar. __TABLES_SUMMARY__ is a meta-table containing information about tables in a dataset. You can use this meta-table yourself. For example, the query SELECT * FROM publicdata:samples.__TABLES_SUMMARY__ will return metadata about the tables in the publicdata:samples dataset.

可用字段:

__TABLES_SUMMARY__ 元表的字段(在 TABLE_QUERY 查询中都可用)包括:

The fields of the __TABLES_SUMMARY__ meta-table (that are all available in the TABLE_QUERY query) include:

  • table_id:表名.
  • creation_time:创建表的时间,以 UTC 时间 1/1/1970 以来的毫秒数为单位.这与表中的 creation_time 字段相同.
  • type:是视图(2)还是普通表(1).
  • table_id: name of the table.
  • creation_time: time, in milliseconds since 1/1/1970 UTC, that the table was created. This is the same as the creation_time field on the table.
  • type: whether it is a view (2) or regular table (1).

以下字段在 TABLE_QUERY()可用,因为它们是 __TABLES__ 的成员,但不是 __TABLES_SUMMARY__.它们被保留在这里是为了历史兴趣并部分记录 __TABLES__ 元表:

The following fields are not available in TABLE_QUERY() since they are members of __TABLES__ but not __TABLES_SUMMARY__. They're kept here for historical interest and to partially document the __TABLES__ metatable:

  • last_modified_time:时间,自 1970 年 1 月 1 日 UTC 起更新表(元数据或表内容)的毫秒数.请注意,如果您使用 tabledata.insertAll() 将记录流式传输到您的表,这可能会过时几分钟.
  • row_count:表中的行数.
  • size_bytes:表的总大小(以字节为单位).
  • last_modified_time: time, in milliseconds since 1/1/1970 UTC, that the table was updated (either metadata or table contents). Note that if you use the tabledata.insertAll() to stream records to your table, this might be a few minutes out of date.
  • row_count: number of rows in the table.
  • size_bytes: total size in bytes of the table.

如何调试

为了调试您的 TABLE_QUERY() 查询,您可以执行与 BigQuery 相同的操作;也就是说,您可以自己运行元表查询.例如:

In order to debug your TABLE_QUERY() queries, you can do the same thing that BigQuery does; that is, you can run the the metatable query yourself. For example:

SELECT * FROM publicdata:samples.__TABLES_SUMMARY__ 
WHERE MSEC_TO_TIMESTAMP(creation_time)  < 
   DATE_ADD(CURRENT_TIMESTAMP(), -7, 'DAY')

让您不仅可以调试查询,还可以查看运行 TABLE_QUERY 函数时将返回哪些表.调试完内部查询后,您可以将其与对这些表的完整查询放在一起.

lets you not only debug your query but also see what tables would be returned when you run the TABLE_QUERY function. Once you have debugged the inner query, you can put it together with your full query over those tables.

这篇关于如何在 BigQuery 中使用 TABLE_QUERY() 函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆