使用LINQ to SQL读取巨大的表:内存不足与分页缓慢 [英] Read huge table with LINQ to SQL: Running out of memory vs slow paging

查看:372
本文介绍了使用LINQ to SQL读取巨大的表:内存不足与分页缓慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一张巨大的桌子,需要按特定顺序阅读并计算一些汇总统计信息.该表已经具有正确顺序的聚簇索引,因此获取记录本身的速度非常快.我正在尝试使用LINQ to SQL来简化我需要编写的代码.问题是我不想将所有对象都加载到内存中,因为DataContext似乎可以保留它们-但是尝试分页会导致可怕的性能问题.

这是细分.最初的尝试是这样的:

var logs = 
    (from record in dataContext.someTable 
     where [index is appropriate]
     select record);

foreach( linqEntity l in logs )
{
    // Do stuff with data from l
}

这是非常快的,并且流率很高,但是问题是应用程序的内存使用量一直在增加,而且从未停止过.我的猜测是LINQ to SQL实体被保留在内存中,并且没有被正确处理.因此,在阅读在创建大量内容时内存不足对对象C#,我尝试了以下方法.这似乎是许多人常用的Skip/Take范例,另外还有节省内存的功能.

请注意,_conn是事先创建的,并且为每个查询创建了一个临时数据上下文,从而导致垃圾回收关联的实体.

int skipAmount = 0;
bool finished = false;

while (!finished)
{
    // Trick to allow for automatic garbage collection while iterating through the DB
    using (var tempDataContext = new MyDataContext(_conn) {CommandTimeout = 600})
    {               
        var query =
            (from record in tempDataContext.someTable
             where [index is appropriate]
             select record);

        List<workerLog> logs = query.Skip(skipAmount).Take(BatchSize).ToList();
        if (logs.Count == 0)
        {
            finished = true;
            continue;
        }

        foreach( linqEntity l in logs )
        {
            // Do stuff with data from l
        }

        skipAmount += logs.Count;
    }
}

现在,我有一种理想的行为,那就是在流式传输数据时,内存使用量根本不会增加.但是,我有一个更糟糕的问题:每个Skip都导致数据加载越来越慢,因为基础查询似乎实际上导致服务器遍历所有先前页面的所有数据.运行查询时,每个页面的加载时间越来越长,我可以说这正在变成二次操作.此问题出现在以下帖子中:

我似乎找不到使用LINQ做到这一点的方法,该方法允许我通过分页数据来限制内存使用量,但仍然在恒定时间内加载每个页面.有没有办法正确地做到这一点? 我的直觉是,也许可以通过某种方法告诉DataContext在上面的第一种方法中明确忘记对象,但是我不知道该怎么做.

解决方案

疯狂地抓住一些吸管后,我发现DataContextObjectTrackingEnabled = false可能正是医生所订购的.毫不奇怪,它是专门为这种只读情况而设计的.

using (var readOnlyDataContext = 
    new MyDataContext(_conn) {CommandTimeout = really_long, ObjectTrackingEnabled = false})
{                                                 
    var logs =
        (from record in readOnlyDataContext.someTable
         where [index is appropriate]
         select record);

    foreach( linqEntity l in logs )
    {
        // Do stuff with data from l   
    }                
}

当通过对象进行流传输时,上述方法不使用任何内存.写入数据时,我可以使用另一个启用了对象跟踪的DataContext,它似乎可以正常工作.但是,这种方法确实存在SQL查询的问题,该查询可能需要一个小时或更长的时间才能完成流式处理和完成,因此,如果有一种如上所述的方法来进行分页而不会影响性能,则我可以采用其他替代方法.

有关关闭对象跟踪的警告:我发现,当您尝试使用同一DataContext进行多个并发读取时,不会出现错误There is already an open DataReader associated with this Command which must be closed first.该应用程序进入100%CPU使用率的无限循环.我不确定这是C#错误还是功能.

I have a huge table which I need to read through on a certain order and compute some aggregate statistics. The table already has a clustered index for the correct order so getting the records themselves is pretty fast. I'm trying to use LINQ to SQL to simplify the code that I need to write. The problem is that I don't want to load all the objects into memory, since the DataContext seems to keep them around -- yet trying to page them results in horrible performance problems.

Here's the breakdown. Original attempt was this:

var logs = 
    (from record in dataContext.someTable 
     where [index is appropriate]
     select record);

foreach( linqEntity l in logs )
{
    // Do stuff with data from l
}

This is pretty fast, and streams at a good rate, but the problem is that the memory use of the application keeps going up never stops. My guess is that the LINQ to SQL entities are being kept around in memory and not being disposed properly. So after reading Out of memory when creating a lot of objects C# , I tried the following approach. This seems to be the common Skip/Take paradigm that many people use, with the added feature of saving memory.

Note that _conn is created beforehand, and a temporary data context is created for each query, resulting in the associated entities being garbage collected.

int skipAmount = 0;
bool finished = false;

while (!finished)
{
    // Trick to allow for automatic garbage collection while iterating through the DB
    using (var tempDataContext = new MyDataContext(_conn) {CommandTimeout = 600})
    {               
        var query =
            (from record in tempDataContext.someTable
             where [index is appropriate]
             select record);

        List<workerLog> logs = query.Skip(skipAmount).Take(BatchSize).ToList();
        if (logs.Count == 0)
        {
            finished = true;
            continue;
        }

        foreach( linqEntity l in logs )
        {
            // Do stuff with data from l
        }

        skipAmount += logs.Count;
    }
}

Now I have the desired behavior that memory usage doesn't increase at all as I am streaming through the data. Yet, I have a far worse problem: each Skip is causing the data to load more and more slowly as the underlying query seems to actually cause the server to go through all the data for all previous pages. While running the query each page takes longer and longer to load, and I can tell that this is turning into a quadratic operation. This problem has appeared in the following posts:

I can't seem to find a way to do this with LINQ that allows me to have limited memory use by paging data, and yet still have each page load in constant time. Is there a way to do this properly? My hunch is that there might be some way to tell the DataContext to explicitly forget about the object in the first approach above, but I can't find out how to do that.

解决方案

After madly grasping at some straws, I found that the DataContext's ObjectTrackingEnabled = false could be just what the doctor ordered. It is, not surprisingly, specifically designed for a read-only case like this.

using (var readOnlyDataContext = 
    new MyDataContext(_conn) {CommandTimeout = really_long, ObjectTrackingEnabled = false})
{                                                 
    var logs =
        (from record in readOnlyDataContext.someTable
         where [index is appropriate]
         select record);

    foreach( linqEntity l in logs )
    {
        // Do stuff with data from l   
    }                
}

The above approach does not use any memory when streaming through objects. When writing data, I can use a different DataContext that has object tracking enabled, and that seems to work okay. However, this approach does have the problem of a SQL query that can take an hour or more to stream and complete, so if there's a way to do the paging as above without the performance hit, I'm open to other alternatives.

A warning about turning object tracking off: I found out that when you try to do multiple concurrent reads with the same DataContext, you don't get the error There is already an open DataReader associated with this Command which must be closed first. The application just goes into an infinite loop with 100% CPU usage. I'm not sure if this is a C# bug or a feature.

这篇关于使用LINQ to SQL读取巨大的表:内存不足与分页缓慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆