C#、实体框架核心 &PostgreSql:插入单行需要 20+ 秒 [英] C#, Entity Framework Core & PostgreSql : inserting a single row takes 20+ seconds

查看:21
本文介绍了C#、实体框架核心 &PostgreSql:插入单行需要 20+ 秒的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Entity Framework Core 和 nuget 包 Npgsql.EntityFrameworkCore.PostgreSQL.

I am using Entity Framework Core and the nuget package Npgsql.EntityFrameworkCore.PostgreSQL.

我已经阅读了所有其他关于 Entity Framework Core 缓慢插入的答案,但没有一个有帮助.

I have read all of the other answers regarding slow inserts to Entity Framework Core, but none have helped.

using (var db = getNewContext())
{
     db.Table1.Add(Table1Object);
     db.SaveChanges();
}

这个单次插入大约需要 20 到 30 秒.表中的行数少于 100.我在 using 里面放了一个秒表开始和停止,以确保时间不是由于上下文初始化.

This single insert takes around 20 to 30 seconds. There are less than 100 rows in the table. I have put a stopwatch start and stop inside the using to make sure the time is not due to context initialisation.

这是我的表对象类(相关属性名称已更改):

Here is my class for the table object (relevant property names changed):

public partial class Table1Object
{
    public long Id { get; set; }
    public Guid SessionId { get; set; }
    public DateTime Timestamp { get; set; }
    public long MyNumber1 { get; set; }
    public double MyNumber2 { get; set; }
    public double MyNumber3 { get; set; }
    public double MyNumber4 { get; set; }
    public long? ParentId { get; set; }
    public bool MyBool { get; set; }
}

SessionId 用于链接到另一个表(会话表),但我没有在任何地方明确定义外键或任何其他约束.ParentId 也用于链接回同一表中的另一行,但我没有为此明确定义约束.

SessionId is used to link to another table (Session table), but I have not explicitly defined a foreign key or any other constraints for this anywhere. ParentId is also used to link back to another row in the same table, but I have not explicitly defined a constraint for this.

在不同的表上运行等效代码只需不到一秒钟即可插入一行.表 2 的列数较少,但我没想到行大小如此不同会产生如此剧烈的影响:

Running the equivalent code on a different table takes less than a second to insert a single row. Table2 has less columns but I wouldn't have thought the row size is so different to have such a drastic effect:

public partial class Table2Object
{
    public int Id { get; set; }
    public DateTime Timestamp { get; set; }
    public string Name { get; set; }
    public double Value { get; set; }
}

使用 Serilog 和 Entity Framework Core 日志记录,您可以看到延迟发生在提交事务"中;这一步大约需要 26 秒,插入本身只需要 6 毫秒(为简洁起见,日志语句的某些部分被修剪了):

Using Serilog and Entity Framework Core logging you can see that the delay is in the "Committing transaction" step which takes about 26 seconds, the insert itself only takes 6ms (Some parts of log statements trimmed down for brevity):

2021-04-08 11:20:36.874 [DBG] 'DataContext' generated a temporary value for the property 'Id.Table1'.
2021-04-08 11:20:36.879 [DBG] Context 'DataContext' started tracking 'Table1' entity.
2021-04-08 11:20:36.880 [DBG] SaveChanges starting for 'DataContext'.
2021-04-08 11:20:36.881 [DBG] DetectChanges starting for 'DataContext'.
2021-04-08 11:20:36.905 [DBG] DetectChanges completed for 'DataContext'.
2021-04-08 11:20:36.906 [DBG] Opening connection to database
2021-04-08 11:20:36.907 [DBG] Opened connection to database
2021-04-08 11:20:36.908 [DBG] Beginning transaction with isolation level 'Unspecified'.
2021-04-08 11:20:36.909 [DBG] Began transaction with isolation level 'ReadCommitted'.
2021-04-08 11:20:36.912 [DBG] Creating DbCommand for 'ExecuteReader'.
2021-04-08 11:20:36.913 [DBG] Created DbCommand for 'ExecuteReader' (0ms).
2021-04-08 11:20:36.914 [DBG] Executing DbCommand [Parameters= ...]
INSERT INTO "Table1" ("SessionId", "Timestamp" ...)
VALUES (@p0, @p1, @p2, @p3, @p4, @p5, @p6, @p7)
RETURNING "Id";
2021-04-08 11:20:36.920 [INF] Executed DbCommand (6ms) Parameters=[...]
INSERT INTO "Table1" ("SessionId", "Timestamp" ...)
VALUES (@p0, @p1, @p2, @p3, @p4, @p5, @p6, @p7)
RETURNING "Id";
2021-04-08 11:20:36.925 [DBG] The foreign key property 'Table1.Id' was detected as changed.
2021-04-08 11:20:36.930 [DBG] A data reader was disposed.
2021-04-08 11:20:36.931 [DBG] Committing transaction.
2021-04-08 11:21:02.729 [DBG] Committed transaction.
2021-04-08 11:21:02.730 [DBG] Closing connection to database

这是插入到 Table2 时的等效日志.插入需要 3 毫秒,提交需要 75 毫秒.这是应该有多快:

Here is the equivalent log when inserting to Table2. The insert takes 3ms and the commit takes 75ms. This is how fast it should be:

2021-04-08 11:20:36.459 [DBG] 'DataContext' generated a temporary value for the property 'Id.Table2'.
2021-04-08 11:20:36.460 [DBG] Context 'DataContext' started tracking 'Table2' entity.
2021-04-08 11:20:36.461 [DBG] SaveChanges starting for 'DataContext'.
2021-04-08 11:20:36.462 [DBG] DetectChanges starting for 'DataContext'.
2021-04-08 11:20:36.463 [DBG] DetectChanges completed for 'DataContext'.
2021-04-08 11:20:36.464 [DBG] Opening connection to database
2021-04-08 11:20:36.465 [DBG] Opened connection to database
2021-04-08 11:20:36.466 [DBG] Beginning transaction with isolation level 'Unspecified'.
2021-04-08 11:20:36.467 [DBG] Began transaction with isolation level 'ReadCommitted'.
2021-04-08 11:20:36.468 [DBG] Creating DbCommand for 'ExecuteReader'.
2021-04-08 11:20:36.469 [DBG] Created DbCommand for 'ExecuteReader' (0ms).
2021-04-08 11:20:36.470 [DBG] Executing DbCommand [Parameters=...]
INSERT INTO "Table2" ("Name", "Timestamp", "Value")
VALUES (@p0, @p1, @p2)
RETURNING "Id";
2021-04-08 11:20:36.472 [INF] Executed DbCommand (3ms) [Parameters=[...]
INSERT INTO "Table2" ("Name", "Timestamp", "Value")
VALUES (@p0, @p1, @p2)
RETURNING "Id";
2021-04-08 11:20:36.474 [DBG] The foreign key property 'Table2.Id' was detected as changed.
2021-04-08 11:20:36.475 [DBG] A data reader was disposed.
2021-04-08 11:20:36.476 [DBG] Committing transaction.
2021-04-08 11:20:36.551 [DBG] Committed transaction.
2021-04-08 11:20:36.552 [DBG] Closing connection to database

除了稍微大一点的行大小之外,我对表格之间的不同感到茫然.我已经删除并重新创建了表,以防有任何我不知道的约束、外键、触发器等.

I am at a loss for what is different between the tables other than the slightly bigger row size. I've dropped and recreated the table in case there were any constraints, foreign keys, triggers, etc. that I was unaware of.

解释"插入生成计划:

"Insert on ""Table1""  (cost=0.00..0.01 rows=1 width=81)"
"  ->  Result  (cost=0.00..0.01 rows=1 width=81)"

启用显示查询日志"对于 postgresql 提供的信息量与实体框架日志记录所提供的信息量大致相同:

Enabling "Show query log" for postgresql gives about the same amount of information as entity framework logging does:

2021-04-09 12:05:06.559 BST [1979] user1@database LOG:  statement: BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED
2021-04-09 12:05:06.560 BST [1979] user1@database LOG:  execute <unnamed>: INSERT INTO "Table1" (...)
    VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
    RETURNING "Id"
2021-04-09 12:05:06.560 BST [1979] user1@database DETAIL:  parameters: $1 = '0.580484961751977', $2 = 'f', $3 = '0.205387434417341', $4 = '18', $5 = '148', $6 = '93c71fb5-836a-486a-8d82-e073743b41cd', $7 = '2021-04-09 11:04:58.123773', $8 = '1.15474773024298'
2021-04-09 12:05:06.565 BST [1979] user1@database LOG:  statement: COMMIT
2021-04-09 12:05:47.352 BST [1443] postgres@database LOG:  statement: /*pga4dash*/
    SELECT 'session_stats' AS chart_name, row_to_json(t) AS chart_data
    FROM ...
    UNION ALL
    SELECT 'tps_stats' AS chart_name, row_to_json(t) AS chart_data
    FROM ...
    UNION ALL
    SELECT 'ti_stats' AS chart_name, row_to_json(t) AS chart_data
    FROM ...
    UNION ALL
    SELECT 'to_stats' AS chart_name, row_to_json(t) AS chart_data
    FROM ...
    UNION ALL
    SELECT 'bio_stats' AS chart_name, row_to_json(t) AS chart_data
    FROM ...
    
2021-04-09 12:05:51.148 BST [1979] user1@database LOG:  statement: DISCARD ALL

您可以看到,在 COMMIT 语句之后,在下一条语句之前经过了大约 41 秒来执行一些内部图表记录信息.41 秒提交单行插入!

You can see that after the COMMIT statement, about 41 seconds passsed before the next statement to do some internal chart logging info. 41 Seconds just to commit a single row insert!

将此与 Table2 的插入进行比较,提交仅需 100 毫秒!

Compare this to an insert for Table2, the commit only takes 100 milliseconds!

2021-04-09 12:05:06.097 BST [1979] user1@database LOG:  statement: BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED
2021-04-09 12:05:06.097 BST [1979] user1@database LOG:  execute <unnamed>: INSERT INTO "Table2" ("Name", "Timestamp", "Value")
    VALUES ($1, $2, $3)
    RETURNING "Id"
2021-04-09 12:05:06.097 BST [1979] user1@database DETAIL:  parameters: $1 = 'Test', $2 = '2021-04-09 11:05:06.096182', $3 = '98'
2021-04-09 12:05:06.098 BST [1979] user1@database LOG:  statement: COMMIT
2021-04-09 12:05:06.189 BST [1979] user1@database LOG:  statement: DISCARD ALL

我直接在 PGAdmin 中运行以下语句,它告诉我花了 323 毫秒:

I ran the following statement directly in PGAdmin, and it told me it took 323ms:

BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
INSERT INTO "Table1" ("MyColumn1", "MyColumn2", "MyColumn3", "MyColumn4", "ParentId", "SessionId", "Timestamp", "MyColumn5")
    VALUES ('0.580484961751977','f' , '0.205387434417341','18',  '148',  '93c71fb5-836a-486a-8d82-e073743b41cd','2021-04-09 11:04:58.123773',  '1.15474773024298')
    RETURNING "Id";
COMMIT;

我还尝试使用 NpgSql 和以下 C# 代码直接运行语句:

I also tried running the statement directly using NpgSql with the following C# code:

            _logger.Debug("Using connection");
            using (var conn = new NpgsqlConnection(StaticConfig.ConnectionString))
            {
                _logger.Debug("connection.open");
                conn.Open();
                _logger.Debug("Using command");
                // Insert some data
                using (var cmd = new NpgsqlCommand(
                    " BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;" +
                    " INSERT INTO "Table1" ("MyColumn1", "MyColumn2", "MyColumn3", "MyColumn4", "ParentId", "SessionId", "Timestamp", "MyColumn5")" +
                    " VALUES ('0.580484961751977','f' , '0.205387434417341','18',  '148',  '93c71fb5-836a-486a-8d82-e073743b41cd','2021-04-09 11:04:58.123773',  '1.15474773024298')" +
                    " RETURNING "Id";" +
                    "COMMIT;"
                    , conn))
                {
                    _logger.Debug("command execute");
                    cmd.ExecuteNonQuery();
                }
            }
            _logger.Debug("Done");

该代码中的日志语句告诉我整个过程不到一秒钟:

The log statements in that code told me that the whole thing took less than a second:

[21:59:41 DBG] Using connection
[21:59:41 DBG] connection.open
[21:59:42 DBG] Using command
[21:59:42 DBG] command execute
[21:59:42 DBG] Done

我还删除了数据库,从Entity Framework中删除了所有迁移,并创建了一个新的Initial create迁移,所以一切都是从头开始的,插入Table1仍然需要大约20秒,但不到一秒插入到Table2中.

I've also dropped the database, deleted all the migrations from Entity Framework, and created a new Initial create migration, so everything is running from scratch, it still takes around 20 seconds to insert into Table1, but less than a second to insert into Table2.

在连接字符串中设置 Enlist=false 没有帮助.

Putting Enlist=false in the connection string doesn't help.

我同意@Mark G 的评论,即调查结果......表明问题要么存在于 EF Core 的上游,要么存在于提供者",但我不确定如何进一步诊断问题.

I would agree with @Mark G's comment that "The findings ... would suggest that the issue either lies upstream in EF Core or in the provider", but I'm unsure how to diagnose the problem further.

我已经更改了代码以使用 NpgSql 通过原始 sql 将行插入到此表中,而且速度非常快,每次插入不到 100 毫秒.因此,最有可能的候选者似乎是 Entity Framework Core 中的错误,但由于我不知道具体是什么问题,因此很难向他们的团队提交错误报告.

I've since changed the code to use NpgSql to insert rows into this table via raw sql, and that's pretty fast, less than 100ms per insert. So the most likely candidate seems to be a bug in Entity Framework Core, but since I don't know what the issue is specifically, it makes raising a bug report with their team difficult.

推荐答案

经过大量测试我最终发现问题根本不在实体框架或 NpgSql 上,但我看到的延迟是由写入缓存引起的.在向表 1 中插入一行之前,我总是写一个 30MB 的文件,我相信文件写入是在 File.WriteAllBytes 返回之后完成的,所以它不会影响任何未来的计时语句.然而,在操作系统层,在插入语句运行时并没有真正完成写入磁盘,导致插入语句被人为延迟.

After a lot of testing I eventually found that the problem was not in Entity framework or NpgSql at all, but the delay I was seeing was caused by write caching. I was always writing a 30MB file before inserting a row into table 1 and I believed that the file writing was done after File.WriteAllBytes returned so it wouldn't impact any future timing statements. However at the OS layer it was not really done writing to disk by the time the insert statement was run causing the insert statement to be artificially delayed.

我用以下代码证明了这一点:

I proved this with the following code:

Stopwatch sw1 = new Stopwatch();
sw1.Start();
File.WriteAllBytes(myBytes);
sw1.Stop();

Thread.Sleep(1000);

Stopwatch sw2 = new Stopwatch();
sw2.Start();
MethodThatInsertsIntoTable1();
sw2.Stop();

秒表 1 显示 File.WriteAllBytes 总是花费大约 500 毫秒,然后秒表 2 计时大约 20 到 30 秒.

stopwatch 1 showed that File.WriteAllBytes always took about 500ms, then stopwatch 2 timed about 20 to 30 seconds.

如果我将 MethodThatInsertsIntoTable1 更改为插入不同的表,那么无论表如何,它仍然需要 20 到 30 秒.

If I change MethodThatInsertsIntoTable1 to insert into a different table then it still takes 20 to 30 seconds regardless of table.

如果我将 Thread.Sleep(1000) 增加到 Thread.Sleep(30000) 然后秒表 2 记录插入时间少于 10 毫秒.

If I increase Thread.Sleep(1000) to Thread.Sleep(30000) then stopwatch 2 records that the insert takes less than 10 milliseconds.

这表明即使在 File.WriteAllBytes 将控制权返回给程序之后,实际上并没有真正完成将文件写入磁盘.

This shows that even after File.WriteAllBytes returns control to the program, it's not actually truly done writing the file to the disk.

我运行的环境是树莓派上的 linux.写入速度测试证实,我对 sd 卡的写入速度刚刚超过 1MB/s,这与我看到的结果一致,写入 30MB 文件需要 20-30 秒,不可能在 500 毫秒内完成秒表 1 说是.

The environment I was running on was linux on a raspberry pi. A write speed test confirms that my write speed to the sd card is just over 1MB/s which would align with the results I'm seeing, 20-30 seconds to write a 30MB file, it couldn't possibly be done in the 500ms that stopwatch 1 says it was.

另一个用户似乎在 File.WriteAllBytes 不阻塞

在树莓派上添加外部 SSD USB HDD 并更改为将文件保存在那里后,保存文件只需要 0.5 秒,问题就消失了.

After adding an external SSD USB HDD to the raspberry pi and changing to save the file there instead, saving the file only takes 0.5 seconds and the problem goes away.

这篇关于C#、实体框架核心 &amp;PostgreSql:插入单行需要 20+ 秒的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆