在数据流中使用fromTable与fromQuery("SELECT * ...")时,BigQueryIO是否有所不同? [英] Is there a difference in `BigQueryIO` when you use `fromTable` vs `fromQuery("SELECT * ...")` in dataflow?

查看:79
本文介绍了在数据流中使用fromTable与fromQuery("SELECT * ...")时,BigQueryIO是否有所不同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当您需要在数据流作业中从bigquery的一个或多个表中读取所有数据时,有两种方法可以解决.第一种方法是将BigQueryIOfrom一起使用,这将读取有问题的表,第二种方法是使用fromQuery,在此指定要从同一表读取所有数据的查询.所以我的问题是:

When you need to read all the data from one or more tables in bigquery in a dataflow job there are two approaches to it I would say. The first one is to use BigQueryIO with from, which reads the table in question, and the second approach is to use fromQuery where you specify a query that reads all the data from the same table. So my question is:

  • 使用一个相对于另一个有任何成本或性能优势吗?

我在文档中没有找到任何关于此的内容,但我真的很想知道.我想也许read会更快,因为您不需要运行扫描数据的查询,这意味着它与BigQuery UI中的预览功能更相似.如果确实如此,那么它的价格可能也便宜得多,但是如果它们的价格相同,那是有道理的.

I haven't find anything in the docs about this, but I would really like to know. I imagine that maybe read is faster since you don't need to run a query that scans the data, meaning it is more similar to the preview functionality you have in BigQuery UI. If that is true it might also be much cheaper, but it make sense if they both cost the same.

因此,简而言之,两者之间有什么区别?

So in short, what is the difference between:

BigQueryIO.read(...).from(tableName)

还有

BigQueryIO.read(...).fromQuery("SELECT * FROM " + tableName)

推荐答案

fromfromQuery(SELECT * FROM ...)便宜且快捷.

  • from直接导出表,对于BigQuery,免费导出数据.
  • fromQuery(SELECT * FROM ...)将首先扫描整个表($ 5/TB)并导出结果.
  • from directly exports the table and exporting data is free for BigQuery.
  • fromQuery(SELECT * FROM ...) will first scan the entire table ($5/TB) and export the result.

这篇关于在数据流中使用fromTable与fromQuery("SELECT * ...")时,BigQueryIO是否有所不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆