热门每组:取(1)工作,但FirstOrDefault()不? [英] Top per group: Take(1) works but FirstOrDefault() doesn't?

查看:198
本文介绍了热门每组:取(1)工作,但FirstOrDefault()不?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的是EF 4.3.1 ...刚刚升级到4.4(问题依然存在),与 EF 4.x版的DbContext发电机。我有一个名为'维基'以下数据库(SQL脚本创建表和数据是这里):

I'm using EF 4.3.1... just upgraded to 4.4 (problem remains) with database-first POCO entities generated by the EF 4.x DbContext Generator. I have the following database named 'Wiki' (SQL script to create tables and data is here):

在wiki文章被编辑,而不是被更新了记录,新版本插入与修改计数器增加一个新的记录。在我的数据库有一个作者,李四,其中有两篇文章,第A和B条,其中第一个拥有两个版本(1和2),而B条只有一个版本。

When a wiki article is edited, instead of its record being updated, the new revision is inserted as a new record with the revision counter incremented. In my database there is one author, "John Doe", which has two articles, "Article A" and "Article B", where article A has two version (1 and 2), but article B has only one version.

我已禁用延迟加载和代理创建两个(这里是我使用与LINQPad样品溶液)。我希望得到人们的名称以约翰在文章的最新版本,所以我做了以下查询:

I have both lazy loading and proxy creation disabled (here is the sample solution I'm using with LINQPad). I want to get the latest revisions of articles created by people whose name starts with "John", so I do the following query:

Authors.Where(au => au.Name.StartsWith("John"))
       .Select(au => au.Articles.GroupBy(ar => ar.Title)
                                .Select(g => g.OrderByDescending(ar => ar.Revision)
                                              .FirstOrDefault()))

这会产生错误的结果,并且仅检索第一篇文章:

This produces the wrong result, and retrieves only the first article:

使查询的微小变化,通过更换 .FirstOrDefault()。取(1)结果在下面的查询:

Making a small change in the query, by replacing .FirstOrDefault() with .Take(1) results in the following query:

Authors.Where(au => au.Name.StartsWith("John"))
       .Select(au => au.Articles.GroupBy(ar => ar.Title)
                                .Select(g => g.OrderByDescending(ar => ar.Revision)
                                              .Take(1)))

奇怪的是,这个查询产生正确的结果(尽管有更多的嵌套):

Surprisingly, this query produces correct results (albeit with more nesting):

我认为EF英孚教育产生略微不同的SQL查询,其中一个返回一个文章仅最新版本,另一个返回所有文章的最新版本。通过这两个查询产生的丑陋的SQL仅略有不同(比较: SQL进行.FirstOrDefault() VS <一HREF =htt​​ps://gist.github.com/3488289相对=nofollow> SQL进行。取(1)),但是他们都返回正确的结果:

I assumed EF is generating slightly different SQL queries, one which returns only the latest revision of a single article, the other returning the latest revision of all articles. The ugly SQL generated by the two queries differ only slightly (compare: SQL for .FirstOrDefault() vs SQL for .Take(1)), but they both return the correct result:

.FirstOrDefault()

。取(1)(重新排列,方便比较,列顺序)

.Take(1) (column order rearranged for easy comparison)

因此​​,罪魁祸首不是生成的SQL,但结果的EF的跨pretation。为什么EF跨preting第一个结果为单个文章实例,同时它除$ P $点的第二个结果是两个文章实例?为什么第一个查询返回不正确的结果?

The culprit therefore is not the generated SQL, but EF's interpretation of the result. Why is EF interpreting the first result into a single Article instance while it interprets the second result as two Article instances? Why does the first query return incorrect results?

编辑:我已经打开了 bug报告的连接。请upvote它,如果你想解决这个问题是很重要的。

I have opened a bug report on Connect. Please upvote it if you think it is important to fix this issue.

推荐答案

看:
http://msdn.microsoft.com/en-us/library /system.linq.enumerable.firstordefault
http://msdn.microsoft.com/en-us/library/bb503062.aspx
有一个关于如何利用工程(懒惰,早brekaing),但没有FirstOrDefault的......更重要的是,眼看拿的解释,我会'guestimate,这与采取的查询可能下调的行数由于一个非常好的解释试图效仿懒惰的评价的在SQL,和你的情况表明这是另一种方式!我ONT明白为什么要遵守这样的效果。

Looking at:
http://msdn.microsoft.com/en-us/library/system.linq.enumerable.firstordefault
http://msdn.microsoft.com/en-us/library/bb503062.aspx
there's very nice explanation on how Take works (lazy, early brekaing) but none of FirstOrDefault.. What's more, seeing the explanation of Take, I'd 'guestimate' that it the queries with Take may cut the number of rows due to an attempt to emulate the lazy evaluation in SQL, and your case indicates it's the other way! I do ont understand why you are observing such effect.

这可能只是具体实现的。对于我来说,无论采取(1)和FirstOrDefault可能看起来像 TOP 1 ,但是从功能上看,有可能是在他们的懒惰略有区别:一个功能可能会评估所有的元素和第一回,第二可以先评估再返回,并打破评估。它仅仅是对有可能发生提​​示。对我来说,这是一句废话,因为我看不到文档在这个问题上和一般的我敢肯定,无论采取/ FirstOrDefault是懒惰的,应该EVAL只有第N个元素。

It's probably just implementation-specific.. For me, both Take(1) and FirstOrDefault might look like TOP 1, however from functional point of view, there may be a slight difference in their 'laziness': one function may evaluate all elements and return first, second may evaluate first then return it and break evaluation.. It is only a "hint" on what might have happened. For me, it is a nonsense, because I see no docs on this subject and in general I'm sure that both Take/FirstOrDefault are lazy and should eval only the first N elements.

在您的查询的第一部分,group.Select +排序依据+ TOP1是一个明确指示,你有兴趣最高的价值在每个组的列单行中 - 但事实上,有<一href="http://sqlblog.com/blogs/adam_machanic/archive/2008/02/08/who-s-on-first-solving-the-top-per-group-problem-part-1-technique.aspx"相对=nofollow>没有简单的方法来做到宣布,在SQL ,所以指示并不清楚在所有的SQL引擎和EF发动机都不是。

In the first part of your query, the group.Select+orderBy+TOP1 is a "clear indication" that you are interested in the single row with highest 'value' in a column per group - but in fact, there is no simple way to do declare that in SQL, so the indication is not that clear at all for the SQL engine and for EF engine neither.

对于我来说,你的行为present可能表明FirstOrDefault是'传播'的EF翻译向上一层内层查询的太多了,好像到Articles.GroupBy()(你确定你有没有错括号adter的排序依据:)) - 这将是一个错误

As for me, the behaviour you present could indicate that the FirstOrDefault was 'propagated' by the EF translator upwards one layer of inner queries too much, as if to the Articles.GroupBy() (are you sure you have not misplaced parens adter the OrderBy? :) ) - and that would be a bug.

但是 -

由于差必须在某个地方执行的意义和/或命令,让我们看看EF可以猜测您的查询的含义。作者是如何实体获取其文章?如何在EF知道这条是绑定到你的作者?当然,在导航性能。但它是如何发生的,只有一些文章是preloaded?似乎很简单 - 查询返回一些结果与前来列,列描述整个作者和整个文章,所以让它们映射到作者和文章,并让它们匹配彼此相导航键。好。但添加复杂的过滤到..?

As the difference must be somewhere in the meaning and/or order of execution, let's see what EF can guess about the meaning of your query. How the Author entity gets its Articles? How the EF knows which Article it is to bind to your author? Of course, the nav property. But how it happens that only some of articles are preloaded? Seems simple - the query returns some results with come columns, columns describe whole Author and Whole Articles, so lets map them to authors and articles and lets match them each other vis nav keys. OK. But add the complex filtering to that..?

有了这样按日期,它是一个一个子查询作为所有文章简单的过滤器,行按日期截断,所有行被消耗。但是,如何写一个复杂的查询,将使用多个中间排序和生产的物品若干子集?哪个子集应绑定到所得作者?所有的联盟?这将抵消所有的顶级水平,类似的条款。首先他们?废话,第一子查询往往是中介的帮手。所以,很可能,当查询被看作是一组子查询具有相似的结构,所有可以采取的作为数据源的局部装载一个导航属性​​,则很可能只有最后子查询被作为实际结果。这是所有的抽象思维,但它让我发现,采取()与FirstOrDefault及其整体加入与LeftJoin含义实际上可以改变的结果集扫描顺序,并以某种方式,以()在某种程度上优化,并在一次扫描完成在整个结果,从而访问所有作者的文章在一次,FirstOrDefault被执行为直接扫描每个作者*为每个标题组*选择最上面的一个,并检查计数和substitue为空是曾多次产生的每个每个作者的文章仅包含一个项目的集合,从而导致一个结果 - 只有来自的最后一个冠军,分组走访

With simple filter like by-date, it is a single subquery for all articles, rows are truncated by date, and all rows are consumed. But how about writing a complex query that would use several intermediate orderings and a produce several subsets of articles? Which subset should be bound to the resulting Author? Union of all of them? That would nullify all top level where-like clauses. First of them? Nonsense, first subqueries tend to be intermediary helpers. So, probably, when a query is seen as a set of subqueries with similar structure that all could be taken as the datasource for a partial-loading of a nav property, then most probably only the last subquery is taken as the actual result. This is all abstract thinking, but it made me notice that Take() versus FirstOrDefault and their overall Join versus LeftJoin meaning could in fact change the order of result set scanning, and, somehow, Take() was somehow optimized and done in one scan over whole result, thus visiting all author's articles at once, and the FirstOrDefault was executed as direct scan for each author * for each title-group * select top one and check count and substitue for null that had many times produced small one-item collections of articles per each author, and thus resulted in one result - coming only from the last title-grouping visited.

这是唯一的解释,我能想到的,明显的除了BUG!喊。作为LINQ的用户,对我来说,它仍然是一个错误。任一这样的优化不应该发生在所有,或它应该包括FirstOrDef太 - 因为它是一样采取(1).DefaultIfEmpty()。嘿,顺便说一句 - 你试过吗?正如我所说的,以(1)不一样FirstOrDefault由于加入/ LEFTJOIN意义 - 但以(1).DefaultIfEmpty()实际上是语义上是一样的。这可能是有趣的,看看有什么SQL查询它产生的SQL和什么样的结果在EF层。

This is the only explanation I can think of, except of obvious "BUG!" shout. As a LINQ-user, for me, it still is a bug. Either such optimization should not have taken place at all, or it should include the FirstOrDef too - as it is the same as Take(1).DefaultIfEmpty(). Heh, by the way - have you tried that? As I said, Take(1) is not same as FirstOrDefault due to the JOIN/LEFTJOIN meaning - but Take(1).DefaultIfEmpty() is actually semantically the same. It could be fun to see what SQL queries it produces at SQL and what results in EF layers.

我不得不承认,该选择相关,实体部分装载的是从来没有明确的给我,我反倒未使用部分加载作为一个looong时间,因为我总是说的查询,以使结果和集团都​​明确地定义(*)......因此,我可以简单地忘记一些重要方面/规则/它的内部工作的定义,也许,IE浏览器。它实际上是选择所有相关的记录形成的结果集(而不仅仅是最后子集合,因为我现在所描述)。如果我忘了的东西,所有的东西我刚才所描述的将是明显的错误。

I have to admit, that selection of the related-entities in partial-loading was never clear to me and I have actually not used the partial-loading for a looong time as always I stated the queries so that the results and groupings are explicitely defined (*).. Hence, I could simply have forgotten about some key aspect/rule/definition of its inner working and maybe, ie. it actually is to select every related record form the result set (not just the last-subcollection as I described now). If I had forgotten something, all what I just described would be obviously wrong.

(*)在你的情况,我会做的Article.AuthorID一个导航财产太(公共作者作者得到设定),然后重写查询,更平坦的相似/流水线,如:

(*) In your case, I'd make the Article.AuthorID a nav-property too (public Author Author get set), and then rewrite the query similar to be more flat/pipelined, like:

var aths = db.Articles
              .GroupBy(ar => new {ar.Author, ar.Title})
              .Take(10)
              .Select(grp => new {grp.Key.Author, Arts = grp.OrderByDescending(ar => ar.Revision).Take(1)} )

,然后填写,而不是试图弥补部分作者和作者使用仅观与对作者和艺术分开。顺便说一句。我没有测试过对EF和SServer,这是翻转查询倒挂和扁平化的子查询的情况下连接和不可用的LEFTJOINs,所以如果你想查看也只是一个例子作者没有文章,就必须从喜欢你的原始查询的作者。

and then fill the View with pairs of Author and Arts separately, instead of trying to partially fill the author and use author-only. Btw. I've not tested it against EF and SServer, it is just an example of 'flipping the query upside down' and 'flattening' the subqueries in case of JOINs and is unusable for LEFTJOINs, so if you'd like to view also the authors without articles, it has to start from the Authors like your original query..

我希望这些松散的想法将有助于找到一个有点'为什么'。

I hope these loose thoughts will help a bit in finding 'why'..

这篇关于热门每组:取(1)工作,但FirstOrDefault()不?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆