为什么在调用 take() 方法时 Slick 会生成子查询 [英] Why does Slick generate a subquery when take() method is called

查看:45
本文介绍了为什么在调用 take() 方法时 Slick 会生成子查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 Slick 1.0.0-RC1.我有这个表对象的定义:

object ProductTable extends Table[(Int, String, String, String, Double, java.sql.Date, Int, Option[Int], Int, Boolean)]("products") {def id = column[Int]("productId", O.PrimaryKey, O.AutoInc)def title = column[String]("title")def description = column[String]("描述")def shortDescription = column[String]("shortDescription")def price = column[Double]("价格")def addedDate = column[java.sql.Date](" addedDate")def BrandId = column[Int]("brandId")def defaultImageId = column[Option[Int]]("defaultImageId")def visitCounter = column[Int]("visitCounter")def 归档 = column[Boolean]("归档")def * = id ~ title ~ description ~ shortDescription ~ price ~ addedDate ~brandId ~ defaultImageId ~visitCounter ~ archived}

我需要一个从数据库中选择 8 行的简单查询:

ProductTable.filter(_.title === "something").sortBy(_.visitCounter).map(_.title).take(8).select 语句

输出为:

选择 x2.x3(从`products` x4中选择x4.`title`作为x3其中 x4.`title` = 'something'按 x4 排序.`visitCounter` 限制 8) x2

如果我去掉 take() 方法:

ProductTable.filter(_.title === "something").sortBy(_.visitCounter).map(_.title).select 语句

那么输出是:

select x2.`title` from `products` x2其中 x2.`title` = 'something'按 x2 排序.`visitCounter`

所以我的问题是:为什么 Slick 在使用 take() 方法构造查询对象时会生成子查询?

附言如果可以相关,我将所有这些都使用 MySql 驱动程序

解决方案

有一个简短的答案和一个长的答案.简短的一个是:子查询在那里,因为到目前为止没有人费心去删除它.

较长的答案与交换map"和take"没有任何区别的事实有关,这是因为查询的编译非常简单和直接.

在查询编译器的早期,有一个forceOuterBinds"阶段,它引入(可能)许多额外的 Bind(又名 flatMap)操作,这些操作在语义上是等效的和冗余的.这个想法是取一些具有集合类型的 x 并将其转换为 Bind(s, x, Pure(s)) 其中 s是一个新鲜的符号.如果 x 已经是 Pure(y) 的形状,我们把它变成 Bind(s, Pure(ProductNode()), Pure(y)) 代替.在 Scala 代码中,可以将其视为将 x: List[T] 转换为 x.flatMap(s => List(s))List(y)List(()).flatMap(s => List(y)).这种转换的目的是通过为我们提供一个地方来修改我们可能想要修改的所有位置的投影(它是作为恒等映射创建的),从而使后面的编译器阶段中的树重写更容易.

稍后,在convertToComprehensions"中,所有来自 monadic 形式的节点(Bind、Pure、Filter、Take、Drop 等)都单独转换为 Comprehension 节点(代表一个 SQL select 语句).结果仍然不是合法的 SQL:SQL 推导式不是 monad 推导式.它们有非常有限的范围规则,不允许 from 子句引用由前一个 from 子句引入的变量(在同一个推导式或封闭式推导式中).>

这就是我们需要下一阶段fuseComprehensions"的原因,它乍一看可能纯粹是一种优化,但实际上需要生成正确的代码.这个阶段尽量融合个人理解,避免这些非法引用.我们在可以融合的方面取得了一些进展,但无法 100% 解决范围问题(事实上,我很确定这是不可能解决的).

重申一下,这一阶段的进展主要是由对正确性的需求驱动的,而不仅仅是生成更好的代码.那么我们可以删除那个额外的子查询吗?是的,当然,但还没有人实施它.

如果您想尝试实施此类优化,请考虑以下注意事项:

  • 您是否满足于删除纯粹的混叠投影(即,select 槽具有 Some(ProductNode(ch)) 形式的 Comprehension,其中 ch 的每个元素都是一个路径)?
  • 或者你可能认为 select x+1 from (... limit ...)) 也应该融合.你可以允许什么样的表达?例如.RowNum 可以吗?
  • 子查询需要什么样的形状?例如,它可能包含 groupByorderBy 子句吗?

(还有一些让我思考的问题:与解释为什么它不存在相比,实现该优化需要多长时间?)

I use Slick 1.0.0-RC1. I have this definition for table object:

object ProductTable extends Table[(Int, String, String, String, Double, java.sql.Date, Int, Option[Int], Int, Boolean)]("products") {
  def id = column[Int]("productId", O.PrimaryKey, O.AutoInc)
  def title = column[String]("title")
  def description = column[String]("description")
  def shortDescription = column[String]("shortDescription")
  def price = column[Double]("price")
  def addedDate = column[java.sql.Date]("addedDate")
  def brandId = column[Int]("brandId")
  def defaultImageId = column[Option[Int]]("defaultImageId")
  def visitCounter = column[Int]("visitCounter")
  def archived = column[Boolean]("archived")
  def * = id ~ title ~ description ~ shortDescription ~ price ~ addedDate ~ brandId ~ defaultImageId ~ visitCounter ~ archived
}

I need a simple query which selects 8 rows from database:

ProductTable.filter(_.title === "something")
  .sortBy(_.visitCounter)
  .map(_.title)
  .take(8)
  .selectStatement

And the output is:

select x2.x3 from 
   (select x4.`title` as x3 from `products` x4 
     where x4.`title` = 'something' 
     order by x4.`visitCounter` limit 8) x2

If I get rid of take() method:

ProductTable.filter(_.title === "something")
 .sortBy(_.visitCounter)
 .map(_.title)
 .selectStatement

then the output is:

select x2.`title` from `products` x2 
where x2.`title` = 'something' 
order by x2.`visitCounter`

So my question is: Why does Slick generate a subquery when its query object is constructed with take() method?

P.S. If it can be related, I use MySql driver with all of these

解决方案

There's a short answer and a long one. The short one is: The subquery is there because so far nobody bothered to remove it.

The longer answer is related to that fact that swapping "map" and "take" does not make any difference, which is due to the compilation of queries being all but simple and straight-forward.

Relatively early in the query compiler there's a "forceOuterBinds" phase which introduces (potentially) lots of extra Bind (a.k.a. flatMap) operations that are semantically equivalent and redundant. The idea is to take some x which has a collection type and turn it into Bind(s, x, Pure(s)) where s is a fresh Symbol. If x is already of the shape Pure(y), we turn it into Bind(s, Pure(ProductNode()), Pure(y)) instead. In Scala code, think of this as turning an x: List[T] into x.flatMap(s => List(s)) or a List(y) into List(()).flatMap(s => List(y)). The purpose of this transformation is to make tree rewriting in the later compiler phases easier by giving us a place to modify the projection (which was created as an identity mapping) in all the places where we might want to do that.

Later on, in "convertToComprehensions", all nodes from the monadic form (Bind, Pure, Filter, Take, Drop, etc.) are converted individually to Comprehension nodes (representing a SQL select statement). The result is still not legal SQL though: SQL comprehensions are not monad comprehensions. They have very limiting scope rules that do not allow a from clause to reference a variable introduced by a previous from clause (in the same comprehension or an enclosing comprehension).

That's why we need the next phase, "fuseComprehensions", which may look purely like an optimization at first glance but is actually required to generate correct code. This phase tries to fuse the individual comprehensions as much as possible to avoid these illegal references. We've made some progress with what we can fuse but a 100% solution to the scope problem is not in sight (and in fact, I am pretty sure that it is impossible to solve).

To reiterate that, progress in this phase was mostly driven by the need for correctness, not just generating nicer code. So could we remove that extra subquery? Yes, certainly, but nobody has implemented it yet.

If you want to try to implement such an optimization, here are some caveats to think about:

  • Are you content with removing purely aliasing projections (i.e. a Comprehension where the select slot has the form Some(ProductNode(ch)) where each element of ch is a Path)?
  • Or maybe you think that select x+1 from (... limit ...)) should also be fused. What kinds of expressions can you allow? E.g. would a RowNum be OK?
  • What kind of shape does the subquery need to have? For example, may it contain groupBy or orderBy clauses?

(And something for me to think about: How long would it take to implement that optimization compared to explaining why it doesn't yet exist?)

这篇关于为什么在调用 take() 方法时 Slick 会生成子查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆