实体框架包含OrderBy随机生成重复数据 [英] Entity Framework Include OrderBy random generates duplicate data

查看:112
本文介绍了实体框架包含OrderBy随机生成重复数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我从包含一些孩子(通过.Include)的数据库中检索到一个项目列表,并随机订购时,EF给我一个意想不到的结果..我创建/克隆添加项目..



为了更好地解释自己,我创建了一个简单的EF CodeFirst项目来重现问题。
首先我会给你这个项目的代码。



项目



创建一个基本MVC3项目,并通过Nuget添加EntityFramework.SqlServerCompact包。

添加以下软件包的最新版本:




  • EntityFramework v4.3.0

  • SqlServerCompact v4.0.8482.1

  • EntityFramework.SqlServerCompact v4.1.8482.2

  • WebActivator v1.5



模型和DbContext

  using System.Collections.Generic; 
使用System.Data.Entity;

命名空间RandomWithInclude.Models
{
public class PeopleContext:DbContext
{
public DbSet< Person>人{get;组; }
public DbSet< Address>地址{get;组; }
}

public class Person
{
public int ID {get;组; }
public string Name {get;组; }

public virtual ICollection< Address>地址{get;组;
}

public class Address
{
public int ID {get;组; }
public string AdressLine {get;组; }

public virtual Person Person {get;组; }
}
}

数据库设置和种子数据: EF.SqlServerCompact.cs

  using System.Collections.Generic; 
使用System.Data.Entity;
使用System.Data.Entity.Infrastructure;
使用RandomWithInclude.Models;

[assembly:WebActivator.PreApplicationStartMethod(typeof(RandomWithInclude.App_Start.EF),Start)]

命名空间RandomWithInclude.App_Start
{
public static class EF
{
public static void Start()
{
Database.DefaultConnectionFactory = new SqlCeConnectionFactory(System.Data.SqlServerCe.4.0);
Database.SetInitializer(new DbInitializer());
}
}
public class DbInitializer:DropCreateDatabaseAlways< PeopleContext>
{
protected override void Seed(PeopleContext context)
{
var address1 = new Address {AdressLine =Street 1,City 1};
var address2 = new Address {AdressLine =Street 2,City 2};
var address3 = new Address {AdressLine =Street 3,City 3};
var address4 = new Address {AdressLine =Street 4,City 4};
var address5 = new Address {AdressLine =Street 5,City 5};
context.Addresses.Add(address1);
context.Addresses.Add(address2);
context.Addresses.Add(address3);
context.Addresses.Add(address4);
context.Addresses.Add(address5);
var person1 = new Person {Name =Person 1,Addresses = new List< Address> {address1,address2}};
var person2 = new Person {Name =Person 2,Addresses = new List< Address> {地址3}};
var person3 = new Person {Name =Person 3,Addresses = new List< Address> {address4,address5}};
context.Persons.Add(person1);
context.Persons.Add(person2);
context.Persons.Add(person3);
}
}
}

控制器: HomeController.cs

  using System; 
使用System.Data.Entity;
使用System.Linq;
使用System.Web.Mvc;
使用RandomWithInclude.Models;

命名空间RandomWithInclude.Controllers
{
public class HomeController:Controller
{
public ActionResult Index()
{
var db = new PeopleContext();
var persons = db.Persons
.Include(p => p.Addresses)
.OrderBy(p => Guid.NewGuid());

return View(persons.ToList());
}
}
}

视图: Index.cshtml

  @using RandomWithInclude.Models 
@model IList< ;人>

< ul>
@foreach(模型中的var person)
{
< li>
@ person.Name
< / li>
}
< / ul>

这应该是全部,应用程序应该编译:)






问题



如您所见,我们有2个简单的模型(个人和地址)和人员可以有多个地址。

我们将生成的数据库 3个人和5个地址进行种子。

如果我们从数据库中获取所有人员,包括解决结果并随机抽取结果,并打印出这些人的姓名,这就是错误的地方。



因此,我有时候得到4个人,有时5个,有时3个,我期望3.总是。

例如:




  • 个人1个人3

  • 个人1

  • 个人3

  • 人2



所以..它是复制/克隆数据!这不是很酷..

只是看起来EF忽略了哪个地址是哪个人的孩子。



生成的SQL查询这是:

  SELECT 
[Project1]。[ID] AS [ID] ,
[Project1]。[Name] AS [Name],
[Project1]。[C2] AS [C1],
[Project1]。[ID1] AS [ID1],
[Project1]。[AdressLine] AS [AdressLine],
[Project1]。[Person_ID] AS [Person_ID]
FROM(SELECT
NEWID()AS [C1],
[Extent1]。[ID] AS [ID],
[Extent1]。[Name] AS [Name],
[Extent2]。[ID] AS [ID1],
[Extent2]。[AdressLine] AS [AdressLine],
[Extent2]。[Person_ID] AS [Person_ID],
CASE WHEN([Extent2]。[ID] IS NULL)THEN CAST AS int)ELSE 1 END AS [C2]
FROM [People] AS [Extent1]
LEFT OUTER JOIN [Addresses] AS [Extent2] ON [Extent1]。[ID] = [Extent2] Person_ID]
)AS [Project1]
ORDER BY [Project1]。[ C1] ASC,[Project1]。[ID] ASC,[Project1]。[C2] ASC



解决方法




  1. 如果我删除 .Include(p => p.Addresses)
  2. 我可以先从数据库获取数据,然后随机添加一个.ToList()之前的.OrderBy ..像这样: var Persons = db.Persons.Include(p => p.Addresses).ToList()。OrderBy(p => Guid。 NewGuid());

有没有人知道为什么会这样发生? >
可能这是SQL生成中的错误?

解决方案

可以通过阅读 AakashM答案 Nicolae Dascalu回答< a>,似乎Linq OrderBy 需要一个稳定的排名功能,其中 NewID / Guid.NewGuid 不是。 / p>

所以我们必须使用另一个在单个查询中稳定的随机生成器。



为了实现这一点,在每次查询之前,使用.Net随机生成器来获取一个随机数。然后将该随机数与实体的唯一属性相结合,以便随机排序。而且'随机化'结果,校验和它。 (校验和是一个计算哈希的SQL Server函数;原始思想基于这个博客



假设 Person code> id 是一个 int ,你可以这样写你的查询:

  var rnd =(new Random())。NextDouble(); 
var persons = db.Persons
.Include(p => p.Addresses)
.OrderBy(p => SqlFunctions.Checksum(p.Id * rnd))
//必须确保排序顺序的唯一性。
.ThenBy(p => p.Id);

NewGuid hack一样,这非常可能不是很好的随机发生器,具有良好的分布等等。但是它并不会导致实体在结果中重复。



小心:

如果您的查询排序不保证您的实体排名的唯一性,您必须补充以保证它,因此我添加了 ThenBy
如果您的查询根实体的排名不是唯一的,则其包含的子项可能与具有相同排名的其他实体的子项混合。然后这个bug会留在这里。



注意:

我更喜欢使用。 Next()方法来获取一个 int 然后通过一个xor( ^ )组合它到实体 int 独特属性,而不是使用 double 并乘以它。但是,对于 int 数据类型,不幸的是没有提供重载,但SQL服务器功能应该支持 SqlFunctions.Checksum 它。您可以使用演员来克服这一点,但为了保持简单,我终于选择了乘以乘法。


When I retrieve a list of items from a database including some children (via .Include), and order the randomly, EF gives me an unexpected result.. I creates/clones addition items..

To explain myself better, I've created a small and simple EF CodeFirst project to reproduce the problem. First i shall give you the code for this project.

The project

Create a basic MVC3 project and add the EntityFramework.SqlServerCompact package via Nuget.
That adds the latest versions of the following packages:

  • EntityFramework v4.3.0
  • SqlServerCompact v4.0.8482.1
  • EntityFramework.SqlServerCompact v4.1.8482.2
  • WebActivator v1.5

The Models and DbContext

using System.Collections.Generic;
using System.Data.Entity;

namespace RandomWithInclude.Models
{
    public class PeopleContext : DbContext
    {
        public DbSet<Person> Persons { get; set; }
        public DbSet<Address> Addresses { get; set; }
    }

    public class Person
    {
        public int ID { get; set; }
        public string Name { get; set; }

        public virtual ICollection<Address> Addresses { get; set; }
    }

    public class Address
    {
        public int ID { get; set; }
        public string AdressLine { get; set; }

        public virtual Person Person { get; set; }
    }
}

The DB Setup and Seed data: EF.SqlServerCompact.cs

using System.Collections.Generic;
using System.Data.Entity;
using System.Data.Entity.Infrastructure;
using RandomWithInclude.Models;

[assembly: WebActivator.PreApplicationStartMethod(typeof(RandomWithInclude.App_Start.EF), "Start")]

namespace RandomWithInclude.App_Start
{
    public static class EF
    {
        public static void Start()
        {
            Database.DefaultConnectionFactory = new SqlCeConnectionFactory("System.Data.SqlServerCe.4.0");
            Database.SetInitializer(new DbInitializer());
        }
    }
    public class DbInitializer : DropCreateDatabaseAlways<PeopleContext>
    {
        protected override void Seed(PeopleContext context)
        {
            var address1 = new Address {AdressLine = "Street 1, City 1"};
            var address2 = new Address {AdressLine = "Street 2, City 2"};
            var address3 = new Address {AdressLine = "Street 3, City 3"};
            var address4 = new Address {AdressLine = "Street 4, City 4"};
            var address5 = new Address {AdressLine = "Street 5, City 5"};
            context.Addresses.Add(address1);
            context.Addresses.Add(address2);
            context.Addresses.Add(address3);
            context.Addresses.Add(address4);
            context.Addresses.Add(address5);
            var person1 = new Person {Name = "Person 1", Addresses = new List<Address> {address1, address2}};
            var person2 = new Person {Name = "Person 2", Addresses = new List<Address> {address3}};
            var person3 = new Person {Name = "Person 3", Addresses = new List<Address> {address4, address5}};
            context.Persons.Add(person1);
            context.Persons.Add(person2);
            context.Persons.Add(person3);
        }
    }
}

The controller: HomeController.cs

using System;
using System.Data.Entity;
using System.Linq;
using System.Web.Mvc;
using RandomWithInclude.Models;

namespace RandomWithInclude.Controllers
{
    public class HomeController : Controller
    {
        public ActionResult Index()
        {
            var db = new PeopleContext();
            var persons = db.Persons
                                .Include(p => p.Addresses)
                                .OrderBy(p => Guid.NewGuid());

            return View(persons.ToList());
        }
    }
}

The View: Index.cshtml

@using RandomWithInclude.Models
@model IList<Person>

<ul>
    @foreach (var person in Model)
    {
        <li>
            @person.Name
        </li>
    }
</ul>

this should be all, and you application should compile :)


The problem

As you can see, we have 2 straightforward models (Person and Address) and Person can have multiple Addresses.
We seed the generated database 3 persons and 5 addresses.
If we get all the persons from the database, including the addresses and randomize the results and just print out the names of those persons, that's where it all goes wrong.

As a result, i sometimes get 4 persons, sometimes 5 and sometimes 3, and i expect 3. Always.
e.g.:

  • Person 1
  • Person 3
  • Person 1
  • Person 3
  • Person 2

So.. it's copying/cloning data! And that's not cool..
It just seems that EF looses track of what addresses are a child of which person..

The generated SQL query is this:

SELECT 
    [Project1].[ID] AS [ID], 
    [Project1].[Name] AS [Name], 
    [Project1].[C2] AS [C1], 
    [Project1].[ID1] AS [ID1], 
    [Project1].[AdressLine] AS [AdressLine], 
    [Project1].[Person_ID] AS [Person_ID]
FROM ( SELECT 
    NEWID() AS [C1], 
    [Extent1].[ID] AS [ID], 
    [Extent1].[Name] AS [Name], 
    [Extent2].[ID] AS [ID1], 
    [Extent2].[AdressLine] AS [AdressLine], 
    [Extent2].[Person_ID] AS [Person_ID], 
    CASE WHEN ([Extent2].[ID] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C2]
    FROM  [People] AS [Extent1]
    LEFT OUTER JOIN [Addresses] AS [Extent2] ON [Extent1].[ID] = [Extent2].[Person_ID]
)  AS [Project1]
ORDER BY [Project1].[C1] ASC, [Project1].[ID] ASC, [Project1].[C2] ASC

Workarounds

  1. If i remove the .Include(p =>p.Addresses) from the query, everything goes fine. but of course the addresses aren't loaded and accessing that collection will make a new call to the database every time.
  2. I can first get the data from the database and randomize later by just adding a .ToList() before the .OrderBy.. like this: var persons = db.Persons.Include(p => p.Addresses).ToList().OrderBy(p => Guid.NewGuid());

Does anybody have any idea of why it is happening like this?
Might this be a bug in the SQL generation?

解决方案

As one can sort it out by reading AakashM answer and Nicolae Dascalu answer, it strongly seems Linq OrderBy requires a stable ranking function, which NewID/Guid.NewGuid is not.

So we have to use another random generator that would be stable inside a single query.

To achieve this, before each querying, use a .Net Random generator to get a random number. Then combine this random number with a unique property of the entity to get randomly sorted. And to 'randomize' a bit the result, checksum it. (checksum is a SQL Server function that compute a hash; original idea founded on this blog.)

Assuming Person Id is an int, you could write your query this way :

var rnd = (new Random()).NextDouble();
var persons = db.Persons
    .Include(p => p.Addresses)
    .OrderBy(p => SqlFunctions.Checksum(p.Id * rnd))
    // Uniqueness of ordering ranking must be ensured.
    .ThenBy(p => p.Id);

Like the NewGuid hack, this is very probably not a good random generator with a good distribution and so on. But it does not cause entities to get duplicated in results.

Beware:
If your query ordering does not guarantees uniqueness of your entities ranking, you must complement it for guarantying it, thus the ThenBy I have added. If your ranking is not unique for your queried root entity, its included children may get mixed with children of other entities having the same ranking. And then the bug will stay here.

Note:
I would prefer use .Next() method to get an int then combine it through a xor (^) to an entity int unique property, rather than using a double and multiply it. But SqlFunctions.Checksum unfortunately does not provide an overload for int data type, though the SQL server function is supposed to support it. You may use a cast to overcome this, but for keeping it simple I finally had chosen to go with the multiply.

这篇关于实体框架包含OrderBy随机生成重复数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆