每组顶部:Take(1)作品,但FirstOrDefault()不是? [英] Top per group: Take(1) works but FirstOrDefault() doesn't?

查看:218
本文介绍了每组顶部:Take(1)作品,但FirstOrDefault()不是?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用EF 4.3.1 ...刚刚升级到4.4(问题依旧)与由EF 4.x DbContext Generator生成的数据库优先POCO实体。我有以下名为Wiki的数据库(用于创建表和数据的SQL脚本是 vs 在Connect上。如果您认为解决这个问题很重要,请加以注意。

解决方案

查看:

< a href =http://msdn.microsoft.com/en-us/library/system.linq.enumerable.firstordefault =nofollow> http://msdn.microsoft.com/en-us/library/ system.linq.enumerable.firstordefault

http ://msdn.microsoft.com/en-us/library/bb503062.aspx

有很好的解释如何工作(懒惰,早期的brekaing),但没有FirstOrDefault ..什么是更多的是,看到Take的解释,我会'客观地',与Take的查询可能会减少行数,这是因为在SQL中尝试模拟懒惰评估,而您的情况表明它是另一种方法!我明白为什么你正在观察这样的效果。



这可能只是实现特定的..对我来说,Take(1)和FirstOrDefault可能看起来像 TOP 1 ,但从功能上看,他们的懒惰可能会有轻微的差异:一个函数可以评估所有元素并首先返回,第二个可以先评估然后返回,打破评估。这只是一个可能发生的提示。对我来说,这是一个废话,因为我没有看到关于这个问题的文档,一般来说,我确信Take / FirstOrDefault都是懒惰的,只能评估前N个元素。



在您的查询的第一部分中,group.Select + orderBy + TOP1是一个清楚的指示,您对每个列中每个列中的值最高的单行感兴趣,但事实上,是没有简单的方法声明在SQL 中,所以对于SQL引擎和EF引擎来说,这个指示并不清楚。

$ b $对于我来说,您所提供的行为可能表明FirstOrDefault被EF翻译器'向上传播了一层内部查询太多了,就好像Article.GroupBy()确定你没有错位的括号会引起OrderBy?:)) - 那会是一个错误。



但是 -



由于差异在某种意义上和/或执行顺序上,我们来看看EF可以猜测查询的含义。作者实体如何获取其文章? EF如何知道要绑定到作者的文章?当然,nav属性。但是如何才能预先载入一些文章呢?看起来很简单 - 查询返回一些结果与列,列描述整个作者和整个文章,所以让他们映射到作者和文章,并让他们彼此匹配导航键。好。但是将复杂的过滤添加到..?



使用简单的过滤器,如按日期,对于所有文章,行都是一个单个子查询被截断日期,并且所有行都被使用。但是如何编写一个复杂的查询,使用多个中间订单和产生几个文章子集?哪个子集应该绑定到结果作者?所有的联盟?这将使所有顶级别的类似条款无效。首先呢废话,第一个子查询往往是中介助理。因此,可能当查询被看作是一组具有相似结构的子查询,所有这些都可以被作为导航属性部分加载的数据源时,最可能只有最后一个子查询作为实际结果。这是所有的抽象思维,但是让我注意到Take()与FirstOrDefault以及它们的整体加入与LeftJoin的含义实际上可以改变结果集扫描的顺序,而且以某种方式,Take()以某种方式进行了优化,并在一次扫描中完成整个结果,因此一次访问所有作者的文章,并且FirstOrDefault被执行为每个作者的每个作者的直接扫描 *选择前一个,并检查计数并替换为空每个作者产生了一小部分文章的小项目,从而产生了一个结果 - 只有从最后一个标题分组访问。



这是我唯一可以想到的解释,除了明显的BUG!喊。作为LINQ用户,对我来说,它仍然是一个错误。这样的优化不应该发生,或者它应该包括FirstOrDef,就像Take(1).DefaultIfEmpty()一样。嗯,顺便说一句,你是否尝试过?正如我所说,由于JOIN / LEFTJOIN的含义,Take(1)与FirstOrDefault不一样,而是Take(1).DefaultIfEmpty()实际上在语义上是相同的。看看它在SQL中产生的SQL查询可能很有趣,什么导致EF层。



我不得不承认,部分相关实体的选择,加载对我来说并不清楚,我实际上不使用部分加载作为一个很长的时间,因为我一直提到查询,以便结果和分组被明确定义(*)..因此,我可以简单地忘记了其内部工作的一些关键方面/规则/定义,也就是说。它实际上是从结果集中选择每个相关记录(不仅仅是我现在描述的最后一个子集合)。如果我忘记了某些东西,那么我刚刚描述的内容显然是错误的。



(*)在你的情况下,我将Article.AuthorID作为导航属性(public Author Author get set),然后重写类似于更平坦/流水线的查询,如:

  var aths = db.Articles 
.GroupBy(ar => new {ar.Author,ar.Title})
.Take(10)
.Select(grp => new {grp。 Key.Author,Arts = grp.OrderByDescending(ar => ar.Revision).Take(1)})

然后分别使用作者和艺术对填写视图,而不是尝试部分填写作者并使用作者。顺便说一句。我没有对EF和SServer进行测试,只是在JOINs的情况下查询颠倒和扁平化子查询的示例,并且对于LEFTJOIN是不可用的,所以如果你也想查看作者没有文章,它必须从作者开始像你的原始查询。



我希望这些松散的想法将有助于找到为什么..


I'm using EF 4.3.1... just upgraded to 4.4 (problem remains) with database-first POCO entities generated by the EF 4.x DbContext Generator. I have the following database named 'Wiki' (SQL script to create tables and data is here):

When a wiki article is edited, instead of its record being updated, the new revision is inserted as a new record with the revision counter incremented. In my database there is one author, "John Doe", which has two articles, "Article A" and "Article B", where article A has two version (1 and 2), but article B has only one version.

I have both lazy loading and proxy creation disabled (here is the sample solution I'm using with LINQPad). I want to get the latest revisions of articles created by people whose name starts with "John", so I do the following query:

Authors.Where(au => au.Name.StartsWith("John"))
       .Select(au => au.Articles.GroupBy(ar => ar.Title)
                                .Select(g => g.OrderByDescending(ar => ar.Revision)
                                              .FirstOrDefault()))

This produces the wrong result, and retrieves only the first article:

Making a small change in the query, by replacing .FirstOrDefault() with .Take(1) results in the following query:

Authors.Where(au => au.Name.StartsWith("John"))
       .Select(au => au.Articles.GroupBy(ar => ar.Title)
                                .Select(g => g.OrderByDescending(ar => ar.Revision)
                                              .Take(1)))

Surprisingly, this query produces correct results (albeit with more nesting):

I assumed EF is generating slightly different SQL queries, one which returns only the latest revision of a single article, the other returning the latest revision of all articles. The ugly SQL generated by the two queries differ only slightly (compare: SQL for .FirstOrDefault() vs SQL for .Take(1)), but they both return the correct result:

.FirstOrDefault()

.Take(1) (column order rearranged for easy comparison)

The culprit therefore is not the generated SQL, but EF's interpretation of the result. Why is EF interpreting the first result into a single Article instance while it interprets the second result as two Article instances? Why does the first query return incorrect results?

EDIT: I have opened a bug report on Connect. Please upvote it if you think it is important to fix this issue.

解决方案

Looking at:
http://msdn.microsoft.com/en-us/library/system.linq.enumerable.firstordefault
http://msdn.microsoft.com/en-us/library/bb503062.aspx
there's very nice explanation on how Take works (lazy, early brekaing) but none of FirstOrDefault.. What's more, seeing the explanation of Take, I'd 'guestimate' that it the queries with Take may cut the number of rows due to an attempt to emulate the lazy evaluation in SQL, and your case indicates it's the other way! I do ont understand why you are observing such effect.

It's probably just implementation-specific.. For me, both Take(1) and FirstOrDefault might look like TOP 1, however from functional point of view, there may be a slight difference in their 'laziness': one function may evaluate all elements and return first, second may evaluate first then return it and break evaluation.. It is only a "hint" on what might have happened. For me, it is a nonsense, because I see no docs on this subject and in general I'm sure that both Take/FirstOrDefault are lazy and should eval only the first N elements.

In the first part of your query, the group.Select+orderBy+TOP1 is a "clear indication" that you are interested in the single row with highest 'value' in a column per group - but in fact, there is no simple way to do declare that in SQL, so the indication is not that clear at all for the SQL engine and for EF engine neither.

As for me, the behaviour you present could indicate that the FirstOrDefault was 'propagated' by the EF translator upwards one layer of inner queries too much, as if to the Articles.GroupBy() (are you sure you have not misplaced parens adter the OrderBy? :) ) - and that would be a bug.

But -

As the difference must be somewhere in the meaning and/or order of execution, let's see what EF can guess about the meaning of your query. How the Author entity gets its Articles? How the EF knows which Article it is to bind to your author? Of course, the nav property. But how it happens that only some of articles are preloaded? Seems simple - the query returns some results with come columns, columns describe whole Author and Whole Articles, so lets map them to authors and articles and lets match them each other vis nav keys. OK. But add the complex filtering to that..?

With simple filter like by-date, it is a single subquery for all articles, rows are truncated by date, and all rows are consumed. But how about writing a complex query that would use several intermediate orderings and a produce several subsets of articles? Which subset should be bound to the resulting Author? Union of all of them? That would nullify all top level where-like clauses. First of them? Nonsense, first subqueries tend to be intermediary helpers. So, probably, when a query is seen as a set of subqueries with similar structure that all could be taken as the datasource for a partial-loading of a nav property, then most probably only the last subquery is taken as the actual result. This is all abstract thinking, but it made me notice that Take() versus FirstOrDefault and their overall Join versus LeftJoin meaning could in fact change the order of result set scanning, and, somehow, Take() was somehow optimized and done in one scan over whole result, thus visiting all author's articles at once, and the FirstOrDefault was executed as direct scan for each author * for each title-group * select top one and check count and substitue for null that had many times produced small one-item collections of articles per each author, and thus resulted in one result - coming only from the last title-grouping visited.

This is the only explanation I can think of, except of obvious "BUG!" shout. As a LINQ-user, for me, it still is a bug. Either such optimization should not have taken place at all, or it should include the FirstOrDef too - as it is the same as Take(1).DefaultIfEmpty(). Heh, by the way - have you tried that? As I said, Take(1) is not same as FirstOrDefault due to the JOIN/LEFTJOIN meaning - but Take(1).DefaultIfEmpty() is actually semantically the same. It could be fun to see what SQL queries it produces at SQL and what results in EF layers.

I have to admit, that selection of the related-entities in partial-loading was never clear to me and I have actually not used the partial-loading for a looong time as always I stated the queries so that the results and groupings are explicitely defined (*).. Hence, I could simply have forgotten about some key aspect/rule/definition of its inner working and maybe, ie. it actually is to select every related record form the result set (not just the last-subcollection as I described now). If I had forgotten something, all what I just described would be obviously wrong.

(*) In your case, I'd make the Article.AuthorID a nav-property too (public Author Author get set), and then rewrite the query similar to be more flat/pipelined, like:

var aths = db.Articles
              .GroupBy(ar => new {ar.Author, ar.Title})
              .Take(10)
              .Select(grp => new {grp.Key.Author, Arts = grp.OrderByDescending(ar => ar.Revision).Take(1)} )

and then fill the View with pairs of Author and Arts separately, instead of trying to partially fill the author and use author-only. Btw. I've not tested it against EF and SServer, it is just an example of 'flipping the query upside down' and 'flattening' the subqueries in case of JOINs and is unusable for LEFTJOINs, so if you'd like to view also the authors without articles, it has to start from the Authors like your original query..

I hope these loose thoughts will help a bit in finding 'why'..

这篇关于每组顶部:Take(1)作品,但FirstOrDefault()不是?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆