是否可以引用在以后的模块中以前的%% sql模块中定义的查询? [英] is it possible to refer to queries defined in a previous %%sql module in a later module?

查看:141
本文介绍了是否可以引用在以后的模块中以前的%% sql模块中定义的查询?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚开始使用新的Google Cloud Datalab和IPython(尽管我已经使用了BigQuery几个月)。 github中的教程和示例非常有帮助,但随着我的脚本和查询变得更加复杂,我想知道几件事情。第一个是这样的:我可以引用在%% sql模块中的%% sql模块中定义的查询吗?另一个有点相关的问题是,我可以以某种方式存储来自一个%% sql模块的结果,然后将该信息放入类似于后续%% sql模块中的IN子句的内容中?

解决方案

以下是一些可以尝试的方法,看看它们是否符合您的需求。如果他们不这样做,我欢迎你在github上提出问题,因为我认为你的两种方案都是我们希望确保工作正常的事情。



对于第一个,它需要结合sql单元和代码单元格[现在]



单元格1

  %% sql --module m1 
DEFINE QUERY q1
SELECT ...

单元格2

  %% sql --module m2 
DEFINE QUERY q2
SELECT ... FROM $ src ...

单元格3

  import gcp.bigquery as bq 

compositequery = bq.Query(m2.q2,src = m1.q1)

实际上,%% sql模块在后台变成自动导入的python模块。



我自己使用%% sql单元格分割查询,但自引入模块以来,我还根据场景,在单个模块中定义了多个查询,您不需要一点python代码缝合toge疗法。取决于你的场景,这是更好的。



对于第二个问题,再一次,如果查询在单元格中分割,则中间需要一些python粘贴。执行一个查询,获取其结果,并将其用作下一个查询的参数。这将适用于一般标量值,但对于IN子句和元组/值列表,我们需要解决此问题: https://github.com/GoogleCloudPlatform/datalab/issues/615



有关如何在BigQuery中使用JOIN的更多建议要在下一个查询中使用的一个查询中生成标量结果,您还可以在BigQuery教程笔记本的步骤3中看到标题为SQL Query Composition的查询。



希望有所帮助。



如前所述,如果遇到某些问题未能按预期运行的特定问题,请提交问题,我们可以看看它是否存在解决问题是有意义的,可能你或其他人甚至可能会加紧做出贡献。 :)

I just started working with the new Google Cloud Datalab and IPython last week (though I've been using BigQuery for a few months). The tutorials and samples in github are very helpful, but as my scripts and queries become more complex I'm wondering a few things. The first one is this: can I refer to queries defined in one %%sql module in a later %%sql module? The other, somewhat related question is can I somehow store the results from one %%sql module and then put that information into something like an IN clause in a subsequent %%sql module?

解决方案

Here's some things to try and see if they meet your needs. If they don't, I welcome you to file issues in github, as I think both of your scenarios are things we want to make sure work well.

For the first, it requires a combination of sql cells and code cells [for now]

Cell 1

%%sql --module m1
DEFINE QUERY q1 
SELECT ...

Cell 2

%%sql --module m2
DEFINE QUERY q2
SELECT ... FROM $src ...

Cell 3

import gcp.bigquery as bq

compositequery = bq.Query(m2.q2, src = m1.q1)

Essentially, %%sql modules are turned into auto-imported python modules behind the scenes.

I used to split out queries per %%sql cell myself, but since the introduction of modules, I also depending on the scenario, define multiple queries within a single module, where you don't need a bit of python code stitching together. Depends on your scenario, which is better.

For your second question, again, if the queries are split across cells, you'll need some python glue in the middle. Execute one query, get its result, and use that as a parameter for the next query. This would work for general scalar values, but for IN clauses and tuples/lists of values, we have this issue we need to address: https://github.com/GoogleCloudPlatform/datalab/issues/615

For more ideas on how you can use JOINs in BigQuery to produce scalar results in one query that you consume in the next query, you can also see the query under Step 3 in the BigQuery tutorial notebook titled "SQL Query Composition".

Hope that helps.

As mentioned, if you hit specific issues where something didn't work as you expected, please do file an issue, and we can see if it makes sense to address, and possibly you or someone else might even step up to make a contribution. :)

这篇关于是否可以引用在以后的模块中以前的%% sql模块中定义的查询?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆