使用LINQ和EF清理表 [英] Table cleanup with LINQ and EF

查看:60
本文介绍了使用LINQ和EF清理表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,


我正在为数据库的清理例程工作。我决定使用EF,因为我没有存储过程的背景,也没有复杂的SQL语句,所以我认为使用实体和LINQ是一种方法。


我必须清理的表包含历史信息,每分钟存储。现在我必须清理它以保存记录,只有当存储的值发生变化或者新的一天开始时。今天在数据库
上有超过800万条记录。


总结一下,我的表格列是:


ID(主键) ),DATE(字符串),TIME(字符串),VAL1(int),VAL2(浮动)等。


因此,在一天(DATE)期间,我必须删除所有重复的记录。  重复表示VAL1,VAL2等与前一记录完全相同。例如,今天我可能有:


Row-> 1 | 5/5/2010 | 0000 | 23 | 2.4


行 - >  2 | 5/5/2010 | 0001 | 23 | 2.4


行 - >  3 | 5/5/2010 | 0002 | 23 | 3.0


行 - >  4 | 5/5/2010 | 0000 | 23 | 3.0


行 - >  5 | 5/6/2010 | 0000 | 23 | 3.0


清理后,我将:


Row-> 1 | 5/5/2010 | 0000 | 23 | 2.4


行 - >  3 | 5/5/2010 | 0002 | 23 | 3.0


行 - >  5 | 5/6/2010 | 0000 | 23 | 3.0


如何使用LINQ和EF执行此操作而无需迭代表中的所有行?


提前感谢,


Igor。


软件开发人员和AI爱好者。 www.twitter.com/ikondrasovas

解决方案

嗨伊戈尔,


 


< span style ="font-size:12pt"> 我的参考文献有一个解决方法。  
首先,我们查询所有有资格删除的ID值。  
然后我们生成一些新实体,其中包含的主键ID等于前一个查询结果中的主键ID,并将它们附加到上下文中。 
在这些实体上调用.DeleteObject API之后,我们使用.SaveChanges()逐个删除它们。  
以下是示例代码:


================================== ===================================


     ;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;
使用
TestDBEntities 上下文=
new TestDBEntities ())


       ;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;
{


               
var query = 来自 d1
in context.DeleteTableTests


<跨度风格="">&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP ;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;
来自 d2 context.DeleteTableTests


     &NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;
其中 d1.ID == d2.ID - 1&&


&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP; &NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;
d1.DATE == d2.DATE&&


&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;
             d1.VAL1 == d2。 VAL1&&


&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP; &NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;
d1.VAL2 == d2.VAL2


<跨度风格="">&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP; &NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;
选择 d2.ID;


 


<跨度风格= "">&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP; &NBSP;
foreach var id


               
{


<跨度风格="">&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;
var delete = new
DeleteTableTest {ID = id};


<跨度风格="">&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP; &NBSP;&NBSP;
context.DeleteTableTests.Attach(删除);


<跨度风格="">&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;
    context.DeleteObject(删除);


          &NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;
}


 


&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;
context.SaveChanges();


           
}


 


< span style ="font-family:Calibri"> ========================================= ==============================


此处该表名为DeleteTableTest。   ;
另外,我在VS2010中使用EF4。   如果您使用的是VS2008和EFv1,则代码应该类似。  
唯一的区别是附加方法。  
我们需要在EFv1中使用此API,
http://msdn.microsoft.com/en-us/library/system.data.objects.objectcontext.attach.aspx   
有关在实体框架中附加和分离的其他信息,

http://msdn.microsoft。 com / zh-CN / library / bb896271.aspx
 


 


注意:由于数据库中有超过800万条记录,我建议你让上述代码在一定范围内执行,例如&NBSP;
我们首先处理ID小于10000,然后10000到20000等的记录。  
此外,这里的一个缺点是EF将逐个删除记录(每个记录使用一个DELETE命令),因此它可能会增加客户端和数据库服务器之间的流量。&NBSP;&NBSP;


 


 


另一种解决方法是直接使用SQL语句或存储过程,并且可以在一次数据库调用中完成。   ;
SQL语句可以是:


================================ =====================================


DELETE
FROM DeleteTableTest WHERE ID
IN


SELECT


[Extent2] [ID]
AS [ID]


FROM  
[dbo] [DeleteTableTest]
AS
[Extent1]


INNER
JOIN [dbo] [DeleteTableTest]
AS [Extent2] ON [ Extent1] [日期]
= [Extent2] [日期]
AND [Extent1] [VAL1]
= [Extent2] [VAL1]
AND [Extent1] [VAL2]
= [Extent2] [VAL2]


WHERE [Extent1] [ID]
= [Extent2] [ID]
- 1 ))


=== ================================================== ================


 


如果您有任何疑问,请随时告诉我。


 


祝你有美好的一天!


 


 


最好的问候,     &NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;


Lingzhi Sun


MSDN订阅者支持
在论坛


如果您对我们的支持有任何反馈,请联系
msdnmg@microsoft.com


Hello all,

I'm curently working in a cleanup routine for a database. I decided using EF because I don't have background on stored procedures nor complex SQL statements, so I think using entities and LINQ would be a way to go.

The table I must clean contains historic information stored at every minute. Now I must clean it up to keep records only when there is a change on the values stored OR if a new day has started. Today there are more than 8 million records on the database.

To summarize, my table columns are:

ID (primary key), DATE (string), TIME(string), VAL1(int), VAL2(float), etc.

So, during a day (DATE) I must delete all duplicated records. Duplicated means when VAL1, VAL2, etc are exctly the same as the previous record. For instance, today I might have:

Row-> 1 | 5/5/2010 | 0000 | 23 | 2.4

Row-> 2 | 5/5/2010 | 0001 | 23 | 2.4

Row-> 3 | 5/5/2010 | 0002 | 23 | 3.0

Row-> 4 | 5/5/2010 | 0000 | 23 | 3.0

Row-> 5 | 5/6/2010 | 0000 | 23 | 3.0

After cleanup, I will have:

Row-> 1 | 5/5/2010 | 0000 | 23 | 2.4

Row-> 3 | 5/5/2010 | 0002 | 23 | 3.0

Row-> 5 | 5/6/2010 | 0000 | 23 | 3.0

How can I use LINQ and EF to perform this operation without having to iterate over all rows on the table?

Thanks in advance,

Igor.


Software Developer and AI Enthusiast. www.twitter.com/ikondrasovas

解决方案

Hi Igor,

 

I have one workaround for your references.   First, we query all the ID values which are qualified to be deleted.   Then we generate some new entities which contains the primary key ID that is equal to the ones in the former query result, and attach them to the context.  After calling .DeleteObject API on these entities, we use .SaveChanges() to delete them one by one.   Here are the sample codes:

=====================================================================

            using (TestDBEntities context = new TestDBEntities())

            {

                var query = from d1 in context.DeleteTableTests

                            from d2 in context.DeleteTableTests

                            where d1.ID == d2.ID - 1 &&

                                  d1.DATE == d2.DATE &&

                                  d1.VAL1 == d2.VAL1 &&

                                  d1.VAL2 == d2.VAL2

                            select d2.ID;

 

                foreach (var id in query)

                {

                    var delete = new DeleteTableTest { ID = id };

                    context.DeleteTableTests.Attach(delete);

                    context.DeleteObject(delete);

                }

 

                context.SaveChanges();

            }

 

=====================================================================

Here the table is named as DeleteTableTest.   Also, I am using EF4 in VS2010.   The codes should be similar if you are using VS2008 and EFv1.   The only difference would be the Attach method.   We need to use this API in EFv1, http://msdn.microsoft.com/en-us/library/system.data.objects.objectcontext.attach.aspx.   Additional information about Attaching and Detaching in Entity Framework, http://msdn.microsoft.com/en-us/library/bb896271.aspx 

 

Note: since you have more than 8 million records in the database, I would recommend you make the above codes to execute in a certain range, e.g.  we first handle the records whose ID is smaller than 10000, and then 10000 to 20000, and etc.   Also, one drawback here is that EF will delete the records one by one (each records use a single DELETE command), so it may increate the traffics between the client and the database server.  

 

 

Another workaround would be using SQL statements or stored procedures directly, and it can be done in one database call.   The SQL statements can be something like:

=====================================================================

DELETE FROM DeleteTableTest WHERE ID IN (

SELECT

[Extent2].[ID] AS [ID]

FROM  [dbo].[DeleteTableTest] AS [Extent1]

INNER JOIN [dbo].[DeleteTableTest] AS [Extent2] ON ([Extent1].[DATE] = [Extent2].[DATE]) AND ([Extent1].[VAL1] = [Extent2].[VAL1]) AND ([Extent1].[VAL2] = [Extent2].[VAL2])

WHERE [Extent1].[ID] = ([Extent2].[ID] - 1))

=====================================================================

 

If you have any questions, please feel free to let me know.

 

Have a great day!

 

 

Best Regards,            
Lingzhi Sun

MSDN Subscriber Support in Forum

If you have any feedback on our support, please contact msdnmg@microsoft.com


这篇关于使用LINQ和EF清理表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆