Entity Framework Include OrderBy 随机生成重复数据 [英] Entity Framework Include OrderBy random generates duplicate data

查看:23
本文介绍了Entity Framework Include OrderBy 随机生成重复数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我从包括一些子项的数据库中检索项目列表(通过 .Include)并随机排序时,EF 给了我一个意想不到的结果..我创建/克隆了附加项目..

When I retrieve a list of items from a database including some children (via .Include), and order the randomly, EF gives me an unexpected result.. I creates/clones addition items..

为了更好地解释自己,我创建了一个小而简单的 EF CodeFirst 项目来重现该问题.首先我给你这个项目的代码.

To explain myself better, I've created a small and simple EF CodeFirst project to reproduce the problem. First i shall give you the code for this project.

创建一个基本的MVC3项目并通过Nuget添加EntityFramework.SqlServerCompact包.
这会添加以下软件包的最新版本:

Create a basic MVC3 project and add the EntityFramework.SqlServerCompact package via Nuget.
That adds the latest versions of the following packages:

  • EntityFramework v4.3.0
  • SqlServerCompact v4.0.8482.1
  • EntityFramework.SqlServerCompact v4.1.8482.2
  • WebActivator v1.5

模型和 DbContext

using System.Collections.Generic;
using System.Data.Entity;

namespace RandomWithInclude.Models
{
    public class PeopleContext : DbContext
    {
        public DbSet<Person> Persons { get; set; }
        public DbSet<Address> Addresses { get; set; }
    }

    public class Person
    {
        public int ID { get; set; }
        public string Name { get; set; }

        public virtual ICollection<Address> Addresses { get; set; }
    }

    public class Address
    {
        public int ID { get; set; }
        public string AdressLine { get; set; }

        public virtual Person Person { get; set; }
    }
}

数据库设置和种子数据:EF.SqlServerCompact.cs

using System.Collections.Generic;
using System.Data.Entity;
using System.Data.Entity.Infrastructure;
using RandomWithInclude.Models;

[assembly: WebActivator.PreApplicationStartMethod(typeof(RandomWithInclude.App_Start.EF), "Start")]

namespace RandomWithInclude.App_Start
{
    public static class EF
    {
        public static void Start()
        {
            Database.DefaultConnectionFactory = new SqlCeConnectionFactory("System.Data.SqlServerCe.4.0");
            Database.SetInitializer(new DbInitializer());
        }
    }
    public class DbInitializer : DropCreateDatabaseAlways<PeopleContext>
    {
        protected override void Seed(PeopleContext context)
        {
            var address1 = new Address {AdressLine = "Street 1, City 1"};
            var address2 = new Address {AdressLine = "Street 2, City 2"};
            var address3 = new Address {AdressLine = "Street 3, City 3"};
            var address4 = new Address {AdressLine = "Street 4, City 4"};
            var address5 = new Address {AdressLine = "Street 5, City 5"};
            context.Addresses.Add(address1);
            context.Addresses.Add(address2);
            context.Addresses.Add(address3);
            context.Addresses.Add(address4);
            context.Addresses.Add(address5);
            var person1 = new Person {Name = "Person 1", Addresses = new List<Address> {address1, address2}};
            var person2 = new Person {Name = "Person 2", Addresses = new List<Address> {address3}};
            var person3 = new Person {Name = "Person 3", Addresses = new List<Address> {address4, address5}};
            context.Persons.Add(person1);
            context.Persons.Add(person2);
            context.Persons.Add(person3);
        }
    }
}

控制器:HomeController.cs

using System;
using System.Data.Entity;
using System.Linq;
using System.Web.Mvc;
using RandomWithInclude.Models;

namespace RandomWithInclude.Controllers
{
    public class HomeController : Controller
    {
        public ActionResult Index()
        {
            var db = new PeopleContext();
            var persons = db.Persons
                                .Include(p => p.Addresses)
                                .OrderBy(p => Guid.NewGuid());

            return View(persons.ToList());
        }
    }
}

视图:Index.cshtml

@using RandomWithInclude.Models
@model IList<Person>

<ul>
    @foreach (var person in Model)
    {
        <li>
            @person.Name
        </li>
    }
</ul>

这应该是全部,你的应用程序应该编译:)

this should be all, and you application should compile :)

如您所见,我们有两个简单的模型(Person 和 Address),Person 可以有多个地址.
我们为生成的数据库播种3 个人和 5 个地址.
如果我们从数据库中获取所有的人,包括地址,然后将结果随机化,然后打印出这些人的姓名,这就是问题所在.

As you can see, we have 2 straightforward models (Person and Address) and Person can have multiple Addresses.
We seed the generated database 3 persons and 5 addresses.
If we get all the persons from the database, including the addresses and randomize the results and just print out the names of those persons, that's where it all goes wrong.

因此,我有时会得到 4 个人,有时是 5 人,有时是 3 人,而我希望是 3.总是.
例如:

As a result, i sometimes get 4 persons, sometimes 5 and sometimes 3, and i expect 3. Always.
e.g.:

  • 第一人
  • 第三人
  • 第一人
  • 第三人
  • 第二个人

所以..它正在复制/克隆数据!这并不酷..
似乎 EF 没有跟踪哪些地址是哪个人的孩子..

So.. it's copying/cloning data! And that's not cool..
It just seems that EF looses track of what addresses are a child of which person..

生成的 SQL 查询是这样的:

The generated SQL query is this:

SELECT 
    [Project1].[ID] AS [ID], 
    [Project1].[Name] AS [Name], 
    [Project1].[C2] AS [C1], 
    [Project1].[ID1] AS [ID1], 
    [Project1].[AdressLine] AS [AdressLine], 
    [Project1].[Person_ID] AS [Person_ID]
FROM ( SELECT 
    NEWID() AS [C1], 
    [Extent1].[ID] AS [ID], 
    [Extent1].[Name] AS [Name], 
    [Extent2].[ID] AS [ID1], 
    [Extent2].[AdressLine] AS [AdressLine], 
    [Extent2].[Person_ID] AS [Person_ID], 
    CASE WHEN ([Extent2].[ID] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C2]
    FROM  [People] AS [Extent1]
    LEFT OUTER JOIN [Addresses] AS [Extent2] ON [Extent1].[ID] = [Extent2].[Person_ID]
)  AS [Project1]
ORDER BY [Project1].[C1] ASC, [Project1].[ID] ASC, [Project1].[C2] ASC

解决方法

  1. 如果我从查询中删除 .Include(p =>p.Addresses),一切都会正常.但当然不会加载地址,并且每次访问该集合都会对数据库进行新调用.
  2. 我可以先从数据库中获取数据,然后通过在 .OrderBy.. 之前添加一个 .ToList() 进行随机化,如下所示:var people = db.Persons.Include(p => p.地址).ToList().OrderBy(p => Guid.NewGuid());
  1. If i remove the .Include(p =>p.Addresses) from the query, everything goes fine. but of course the addresses aren't loaded and accessing that collection will make a new call to the database every time.
  2. I can first get the data from the database and randomize later by just adding a .ToList() before the .OrderBy.. like this: var persons = db.Persons.Include(p => p.Addresses).ToList().OrderBy(p => Guid.NewGuid());

有人知道为什么会这样吗?
这可能是 SQL 生成中的错误吗?

Does anybody have any idea of why it is happening like this?
Might this be a bug in the SQL generation?

推荐答案

阅读AakashM 答案Nicolae Dascalu 的回答,看来 Linq OrderBy 需要稳定的排名函数,其中 NewID/Guid.NewGuid 不是.

As one can sort it out by reading AakashM answer and Nicolae Dascalu answer, it strongly seems Linq OrderBy requires a stable ranking function, which NewID/Guid.NewGuid is not.

所以我们必须使用另一个在单个查询中稳定的随机生成器.

So we have to use another random generator that would be stable inside a single query.

为此,在每次查询之前,使用 .Net Random 生成器来获取随机数.然后将此随机数与实体的唯一属性结合起来进行随机排序.并将结果随机化"一点,校验和.(checksum 是一个计算哈希的 SQL Server 函数;基于 这个博客.)

To achieve this, before each querying, use a .Net Random generator to get a random number. Then combine this random number with a unique property of the entity to get randomly sorted. And to 'randomize' a bit the result, checksum it. (checksum is a SQL Server function that compute a hash; original idea founded on this blog.)

假设 Person Id 是一个 int,你可以这样写你的查询:

Assuming Person Id is an int, you could write your query this way :

var rnd = (new Random()).NextDouble();
var persons = db.Persons
    .Include(p => p.Addresses)
    .OrderBy(p => SqlFunctions.Checksum(p.Id * rnd));

NewGuid hack 一样,这很可能不是具有良好分布等的良好随机生成器.但它不会导致实体在结果中重复.

Like the NewGuid hack, this is very probably not a good random generator with a good distribution and so on. But it does not cause entities to get duplicated in results.

当心:
如果您的查询排序不能保证您的实体排名的唯一性,您必须对其进行补充以保证它.例如,如果您使用实体的非唯一属性进行校验和调用,则在 OrderBy.ThenBy(p => p.Id) 的内容>.
如果您查询的根实体的排名不是唯一的,则其包含的子实体可能会与具有相同排名的其他实体的子实体混合.然后 bug 就会留在这里.

Beware:
If your query ordering does not guarantees uniqueness of your entities ranking, you must complement it for guarantying it. By example, if you use a non-unique property of your entities for the checksum call, then add something like .ThenBy(p => p.Id) after the OrderBy.
If your ranking is not unique for your queried root entity, its included children may get mixed with children of other entities having the same ranking. And then the bug will stay here.

注意:
我更喜欢使用 .Next() 方法来获取 int 然后通过 xor (^) 将它组合到实体 int 唯一属性,而不是使用 double 并将其相乘.但是 SqlFunctions.Checksum 不幸的是没有为 int 数据类型提供重载,尽管 SQL 服务器函数应该支持它.您可以使用强制转换来克服这个问题,但为了保持简单,我最终选择了乘法.

Note:
I would prefer use .Next() method to get an int then combine it through a xor (^) to an entity int unique property, rather than using a double and multiply it. But SqlFunctions.Checksum unfortunately does not provide an overload for int data type, though the SQL server function is supposed to support it. You may use a cast to overcome this, but for keeping it simple I finally had chosen to go with the multiply.

这篇关于Entity Framework Include OrderBy 随机生成重复数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆