如何使用 EF Core 聚合数百万行 [英] How to aggregate millions of rows using EF Core

查看:31
本文介绍了如何使用 EF Core 聚合数百万行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试根据用户汇总大约 200 万行.一个用户有多个交易,每个交易有一个平台和一个交易类型.我将平台和交易类型列聚合为json并保存为一行.

I'm trying to aggregate approximately two million rows based on user. One user has several Transactions, each Transaction has a Platform and a TransactionType.I aggregate Platform and TransactionType columns as json and save as a single row.

但是我的代码很慢.如何提高性能?

But my code is slow. How can I improve the performance?

  public static void AggregateTransactions()
        {
            using (var db = new ApplicationDbContext())
            {
                db.ChangeTracker.AutoDetectChangesEnabled = false;

                //Get a list of users who have transactions  
                var users = db.Transactions
                   .Select(x => x.User)
                   .Distinct();

                foreach (var user in users.ToList())
                {
                    //Get all transactions for a particular user
                    var _transactions = db.Transactions
                        .Include(x => x.Platform)
                        .Include(x => x.TransactionType)
                        .Where(x => x.User == user)
                        .ToList();

//Aggregate Platforms from all transactions for user
                    Dictionary<string, int> platforms = new Dictionary<string, int>();

                    foreach (var item in _transactions.Select(x => x.Platform).GroupBy(x => x.Name).ToList())
                    {
                        platforms.Add(item.Key, item.Count());
                    };

//Aggregate TransactionTypes from all transactions for user
                   Dictionary<string, int> transactionTypes = new Dictionary<string, int>();

                    foreach (var item in _transactions.Select(x => x.TransactionType).GroupBy(x => x.Name).ToList())
                    {
                        transactionTypes.Add(item.Key, item.Count());
                    };


                    db.Add<TransactionByDay>(new TransactionByDay
                    {
                        User = user,
                        Platforms = platforms,     //The dictionary list is represented as json in table
                        TransactionTypes = transactionTypes     //The dictionary list is represented as json in table
                    });

                    db.SaveChanges();

                }

            }

        }

更新

因此数据的基本视图如下所示:

So a basic view of the data would look like the following:

交易数据:

ID:b11c6b67-6c74-4bbe-f712-08d609af20cf,用户 ID:1,平台 ID:3,交易类型:1

Id: b11c6b67-6c74-4bbe-f712-08d609af20cf, UserId: 1, PlatformId: 3, TransactionypeId: 1

编号:4782803f-2f6b-4d99-f717-08d609af20cf,用户 ID:1,平台 ID:3,交易类型:4

Id: 4782803f-2f6b-4d99-f717-08d609af20cf, UserId: 1, PlatformId: 3, TransactionypeId: 4

将数据汇总为 TransactionPerDay:

编号:9df41ef2-2fc8-441b-4a2f-08d609e21559,用户 ID:1,平台:{"p3":2},交易类型:{"t1":1,"t4":1}

Id: 9df41ef2-2fc8-441b-4a2f-08d609e21559, UserId: 1, Platforms: {"p3":2}, TransactionsTypes: {"t1":1,"t4":1}

所以在这种情况下,两笔交易合并为一笔.可以看到平台和交易类型会聚合成json.

So in this case, two transactions are aggregated into one. You can see that the platforms and transaction types will be aggregated as json.

推荐答案

您可能不应该在循环内调用 db.saveChanges().将其置于循环之外以保留一次更改可能会有所帮助.

You probably should not be calling db.saveChanges() within the loop. Putting it outside the loop to persist the changes once, may help.

但话虽如此,当处理大量数据和性能是关键时,我发现 ADO.NET 可能是更好的选择.这并不意味着您必须停止使用实体框架,但对于这种方法,您可能可以使用 ADO.NET.如果你沿着这条路走下去,你可以:

But having said this, when dealing with large volumes of data and performance is key, I've found that ADO.NET is probably a better choice. This does not mean you have to stop using Entity Framework, but perhaps for this method you could use ADO.NET. If you go down this path you could either:

  1. 创建一个存储过程来返回您需要处理的数据、填充数据表、操作数据并使用 sqlBulkCopy 批量保存所有内容.

  1. Create a stored procedure to return the data you need to work on, populate a datatable, manipulate the data and the persist everything in bulk using sqlBulkCopy.

使用存储过程完全执行此操作.这避免了将数据传送到您的应用程序的需要,整个处理过程可以在数据库本身内进行.

Use a stored procedure to completely perform this operation. This avoids the need to shuttle the data to your application and the entire processing can happen within the database itself.

这篇关于如何使用 EF Core 聚合数百万行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆