C#,实体框架核心&PostgreSQL:插入一行需要20秒钟以上的时间 [英] C#, Entity Framework Core & PostgreSql : inserting a single row takes 20+ seconds

查看:83
本文介绍了C#,实体框架核心&PostgreSQL:插入一行需要20秒钟以上的时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Entity Framework Core和nuget包 Npgsql.EntityFrameworkCore.PostgreSQL .

I am using Entity Framework Core and the nuget package Npgsql.EntityFrameworkCore.PostgreSQL.

我已经阅读了有关慢速插入Entity Framework Core的所有其他答案,但都没有帮助.

I have read all of the other answers regarding slow inserts to Entity Framework Core, but none have helped.

using (var db = getNewContext())
{
     db.Table1.Add(Table1Object);
     db.SaveChanges();
}

一次插入大约需要20到30秒.表中少于100行.我已在使用中放置了一个秒表开始和停止位置,以确保时间不是由于上下文初始化引起的.

This single insert takes around 20 to 30 seconds. There are less than 100 rows in the table. I have put a stopwatch start and stop inside the using to make sure the time is not due to context initialisation.

这是我的表对象类(相关的属性名称已更改):

Here is my class for the table object (relevant property names changed):

public partial class Table1Object
{
    public long Id { get; set; }
    public Guid SessionId { get; set; }
    public DateTime Timestamp { get; set; }
    public long MyNumber1 { get; set; }
    public double MyNumber2 { get; set; }
    public double MyNumber3 { get; set; }
    public double MyNumber4 { get; set; }
    public long? ParentId { get; set; }
    public bool MyBool { get; set; }
}

SessionId 用于链接到另一个表(Session表),但是我没有在任何地方显式定义外键或任何其他约束. ParentId 也用于链接回同一表中的另一行,但是我没有为此明确定义约束.

SessionId is used to link to another table (Session table), but I have not explicitly defined a foreign key or any other constraints for this anywhere. ParentId is also used to link back to another row in the same table, but I have not explicitly defined a constraint for this.

在另一张表上运行等效代码只需不到一秒钟的时间即可插入一行.Table2的列较少,但我不会认为行的大小如此不同以至于产生如此大的影响:

Running the equivalent code on a different table takes less than a second to insert a single row. Table2 has less columns but I wouldn't have thought the row size is so different to have such a drastic effect:

public partial class Table2Object
{
    public int Id { get; set; }
    public DateTime Timestamp { get; set; }
    public string Name { get; set; }
    public double Value { get; set; }
}

使用Serilog和Entity Framework Core日志记录,您可以看到延迟在提交事务"中.步骤大约需要26秒,插入本身仅需6毫秒(为简洁起见,已将日志语句的某些部分裁剪):

Using Serilog and Entity Framework Core logging you can see that the delay is in the "Committing transaction" step which takes about 26 seconds, the insert itself only takes 6ms (Some parts of log statements trimmed down for brevity):

2021-04-08 11:20:36.874 [DBG] 'DataContext' generated a temporary value for the property 'Id.Table1'.
2021-04-08 11:20:36.879 [DBG] Context 'DataContext' started tracking 'Table1' entity.
2021-04-08 11:20:36.880 [DBG] SaveChanges starting for 'DataContext'.
2021-04-08 11:20:36.881 [DBG] DetectChanges starting for 'DataContext'.
2021-04-08 11:20:36.905 [DBG] DetectChanges completed for 'DataContext'.
2021-04-08 11:20:36.906 [DBG] Opening connection to database
2021-04-08 11:20:36.907 [DBG] Opened connection to database
2021-04-08 11:20:36.908 [DBG] Beginning transaction with isolation level 'Unspecified'.
2021-04-08 11:20:36.909 [DBG] Began transaction with isolation level 'ReadCommitted'.
2021-04-08 11:20:36.912 [DBG] Creating DbCommand for 'ExecuteReader'.
2021-04-08 11:20:36.913 [DBG] Created DbCommand for 'ExecuteReader' (0ms).
2021-04-08 11:20:36.914 [DBG] Executing DbCommand [Parameters= ...]
INSERT INTO "Table1" ("SessionId", "Timestamp" ...)
VALUES (@p0, @p1, @p2, @p3, @p4, @p5, @p6, @p7)
RETURNING "Id";
2021-04-08 11:20:36.920 [INF] Executed DbCommand (6ms) Parameters=[...]
INSERT INTO "Table1" ("SessionId", "Timestamp" ...)
VALUES (@p0, @p1, @p2, @p3, @p4, @p5, @p6, @p7)
RETURNING "Id";
2021-04-08 11:20:36.925 [DBG] The foreign key property 'Table1.Id' was detected as changed.
2021-04-08 11:20:36.930 [DBG] A data reader was disposed.
2021-04-08 11:20:36.931 [DBG] Committing transaction.
2021-04-08 11:21:02.729 [DBG] Committed transaction.
2021-04-08 11:21:02.730 [DBG] Closing connection to database

这是插入表2时的等效日志.插入需要3毫秒,而提交则需要75毫秒.这是应该的速度:

Here is the equivalent log when inserting to Table2. The insert takes 3ms and the commit takes 75ms. This is how fast it should be:

2021-04-08 11:20:36.459 [DBG] 'DataContext' generated a temporary value for the property 'Id.Table2'.
2021-04-08 11:20:36.460 [DBG] Context 'DataContext' started tracking 'Table2' entity.
2021-04-08 11:20:36.461 [DBG] SaveChanges starting for 'DataContext'.
2021-04-08 11:20:36.462 [DBG] DetectChanges starting for 'DataContext'.
2021-04-08 11:20:36.463 [DBG] DetectChanges completed for 'DataContext'.
2021-04-08 11:20:36.464 [DBG] Opening connection to database
2021-04-08 11:20:36.465 [DBG] Opened connection to database
2021-04-08 11:20:36.466 [DBG] Beginning transaction with isolation level 'Unspecified'.
2021-04-08 11:20:36.467 [DBG] Began transaction with isolation level 'ReadCommitted'.
2021-04-08 11:20:36.468 [DBG] Creating DbCommand for 'ExecuteReader'.
2021-04-08 11:20:36.469 [DBG] Created DbCommand for 'ExecuteReader' (0ms).
2021-04-08 11:20:36.470 [DBG] Executing DbCommand [Parameters=...]
INSERT INTO "Table2" ("Name", "Timestamp", "Value")
VALUES (@p0, @p1, @p2)
RETURNING "Id";
2021-04-08 11:20:36.472 [INF] Executed DbCommand (3ms) [Parameters=[...]
INSERT INTO "Table2" ("Name", "Timestamp", "Value")
VALUES (@p0, @p1, @p2)
RETURNING "Id";
2021-04-08 11:20:36.474 [DBG] The foreign key property 'Table2.Id' was detected as changed.
2021-04-08 11:20:36.475 [DBG] A data reader was disposed.
2021-04-08 11:20:36.476 [DBG] Committing transaction.
2021-04-08 11:20:36.551 [DBG] Committed transaction.
2021-04-08 11:20:36.552 [DBG] Closing connection to database

除了行大小稍大以外,我对其他表之间的不同感到茫然.我放下并重新创建了表,以防万一我没有意识到任何约束,外键,触发器等.

I am at a loss for what is different between the tables other than the slightly bigger row size. I've dropped and recreated the table in case there were any constraints, foreign keys, triggers, etc. that I was unaware of.

说明"插入的计划生成:

An "Explain" plan for the insert generates:

"Insert on ""Table1""  (cost=0.00..0.01 rows=1 width=81)"
"  ->  Result  (cost=0.00..0.01 rows=1 width=81)"

启用显示查询日志"用于postgresql的信息量与实体框架日志记录的信息量相同:

Enabling "Show query log" for postgresql gives about the same amount of information as entity framework logging does:

2021-04-09 12:05:06.559 BST [1979] user1@database LOG:  statement: BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED
2021-04-09 12:05:06.560 BST [1979] user1@database LOG:  execute <unnamed>: INSERT INTO "Table1" (...)
    VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
    RETURNING "Id"
2021-04-09 12:05:06.560 BST [1979] user1@database DETAIL:  parameters: $1 = '0.580484961751977', $2 = 'f', $3 = '0.205387434417341', $4 = '18', $5 = '148', $6 = '93c71fb5-836a-486a-8d82-e073743b41cd', $7 = '2021-04-09 11:04:58.123773', $8 = '1.15474773024298'
2021-04-09 12:05:06.565 BST [1979] user1@database LOG:  statement: COMMIT
2021-04-09 12:05:47.352 BST [1443] postgres@database LOG:  statement: /*pga4dash*/
    SELECT 'session_stats' AS chart_name, row_to_json(t) AS chart_data
    FROM ...
    UNION ALL
    SELECT 'tps_stats' AS chart_name, row_to_json(t) AS chart_data
    FROM ...
    UNION ALL
    SELECT 'ti_stats' AS chart_name, row_to_json(t) AS chart_data
    FROM ...
    UNION ALL
    SELECT 'to_stats' AS chart_name, row_to_json(t) AS chart_data
    FROM ...
    UNION ALL
    SELECT 'bio_stats' AS chart_name, row_to_json(t) AS chart_data
    FROM ...
    
2021-04-09 12:05:51.148 BST [1979] user1@database LOG:  statement: DISCARD ALL

您可以看到,在COMMIT语句之后,在下一条语句之前经过了约41秒,以执行一些内部图表记录信息.只需提交单行插入即可41秒!

You can see that after the COMMIT statement, about 41 seconds passsed before the next statement to do some internal chart logging info. 41 Seconds just to commit a single row insert!

将此结果与Table2的插入内容进行比较,提交只需100毫秒!

Compare this to an insert for Table2, the commit only takes 100 milliseconds!

2021-04-09 12:05:06.097 BST [1979] user1@database LOG:  statement: BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED
2021-04-09 12:05:06.097 BST [1979] user1@database LOG:  execute <unnamed>: INSERT INTO "Table2" ("Name", "Timestamp", "Value")
    VALUES ($1, $2, $3)
    RETURNING "Id"
2021-04-09 12:05:06.097 BST [1979] user1@database DETAIL:  parameters: $1 = 'Test', $2 = '2021-04-09 11:05:06.096182', $3 = '98'
2021-04-09 12:05:06.098 BST [1979] user1@database LOG:  statement: COMMIT
2021-04-09 12:05:06.189 BST [1979] user1@database LOG:  statement: DISCARD ALL

我直接在PGAdmin中运行以下语句,它告诉我花了323ms:

I ran the following statement directly in PGAdmin, and it told me it took 323ms:

BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
INSERT INTO "Table1" ("MyColumn1", "MyColumn2", "MyColumn3", "MyColumn4", "ParentId", "SessionId", "Timestamp", "MyColumn5")
    VALUES ('0.580484961751977','f' , '0.205387434417341','18',  '148',  '93c71fb5-836a-486a-8d82-e073743b41cd','2021-04-09 11:04:58.123773',  '1.15474773024298')
    RETURNING "Id";
COMMIT;

我还尝试使用带有以下C#代码的NpgSql直接运行该语句:

I also tried running the statement directly using NpgSql with the following C# code:

            _logger.Debug("Using connection");
            using (var conn = new NpgsqlConnection(StaticConfig.ConnectionString))
            {
                _logger.Debug("connection.open");
                conn.Open();
                _logger.Debug("Using command");
                // Insert some data
                using (var cmd = new NpgsqlCommand(
                    " BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;" +
                    " INSERT INTO \"Table1\" (\"MyColumn1\", \"MyColumn2\", \"MyColumn3\", \"MyColumn4\", \"ParentId\", \"SessionId\", \"Timestamp\", \"MyColumn5\")" +
                    " VALUES ('0.580484961751977','f' , '0.205387434417341','18',  '148',  '93c71fb5-836a-486a-8d82-e073743b41cd','2021-04-09 11:04:58.123773',  '1.15474773024298')" +
                    " RETURNING \"Id\";" +
                    "COMMIT;"
                    , conn))
                {
                    _logger.Debug("command execute");
                    cmd.ExecuteNonQuery();
                }
            }
            _logger.Debug("Done");

该代码中的log语句告诉我,整个过程花了不到一秒钟的时间:

The log statements in that code told me that the whole thing took less than a second:

[21:59:41 DBG] Using connection
[21:59:41 DBG] connection.open
[21:59:42 DBG] Using command
[21:59:42 DBG] command execute
[21:59:42 DBG] Done

我还删除了数据库,从Entity Framework删除了所有迁移,并创建了一个新的Initial create迁移,因此一切都从头开始,插入表1大约需要20秒,但不到一秒钟插入表2.

I've also dropped the database, deleted all the migrations from Entity Framework, and created a new Initial create migration, so everything is running from scratch, it still takes around 20 seconds to insert into Table1, but less than a second to insert into Table2.

在连接字符串中放入Enlist = false无效.

Putting Enlist=false in the connection string doesn't help.

我会同意@Mark G的评论,即发现……表明问题要么在EF Core上游,要么在提供商中",但是我不确定如何进一步诊断问题.

I would agree with @Mark G's comment that "The findings ... would suggest that the issue either lies upstream in EF Core or in the provider", but I'm unsure how to diagnose the problem further.

此后,我更改了代码以使用NpgSql通过原始sql向该表中插入行,这相当快,每次插入少于100ms.因此,最有可能的候选人似乎是Entity Framework Core中的错误,但是由于我不知道具体是什么问题,因此很难与他们的团队一起提交错误报告.

I've since changed the code to use NpgSql to insert rows into this table via raw sql, and that's pretty fast, less than 100ms per insert. So the most likely candidate seems to be a bug in Entity Framework Core, but since I don't know what the issue is specifically, it makes raising a bug report with their team difficult.

推荐答案

经过大量测试,我最终发现问题根本不在Entity Framework或NpgSql中,但是我看到的延迟是由写缓存引起的.在向表1中插入一行之前,我总是写30MB的文件,并且我相信文件写是在File.WriteAllBytes返回之后完成的,因此不会影响以后的任何计时语句.但是,在OS层上,在执行insert语句之前,并没有真正完成向磁盘的写入操作,从而导致insert语句被人为延迟.

After a lot of testing I eventually found that the problem was not in Entity framework or NpgSql at all, but the delay I was seeing was caused by write caching. I was always writing a 30MB file before inserting a row into table 1 and I believed that the file writing was done after File.WriteAllBytes returned so it wouldn't impact any future timing statements. However at the OS layer it was not really done writing to disk by the time the insert statement was run causing the insert statement to be artificially delayed.

我用以下代码证明了这一点:

I proved this with the following code:

Stopwatch sw1 = new Stopwatch();
sw1.Start();
File.WriteAllBytes(myBytes);
sw1.Stop();

Thread.Sleep(1000);

Stopwatch sw2 = new Stopwatch();
sw2.Start();
MethodThatInsertsIntoTable1();
sw2.Stop();

秒表1显示File.WriteAllBytes始终花费约500毫秒,然后秒表2计时约20到30秒.

stopwatch 1 showed that File.WriteAllBytes always took about 500ms, then stopwatch 2 timed about 20 to 30 seconds.

如果我将MethodThatInsertsIntoTable1更改为插入到另一个表中,则无论使用哪个表,它仍然需要20到30秒.

If I change MethodThatInsertsIntoTable1 to insert into a different table then it still takes 20 to 30 seconds regardless of table.

如果我将Thread.Sleep(1000)增加到Thread.Sleep(30000),则秒表2记录插入时间少于10毫秒.

If I increase Thread.Sleep(1000) to Thread.Sleep(30000) then stopwatch 2 records that the insert takes less than 10 milliseconds.

这表明,即使在File.WriteAllBytes将控制权返回给程序之后,实际上并没有真正将文件写入磁盘.

This shows that even after File.WriteAllBytes returns control to the program, it's not actually truly done writing the file to the disk.

我运行的环境是树莓派上的linux.写入速度测试确认我对sd卡的写入速度刚好超过1MB/s,这与我看到的结果一致,即写入30MB文件需要20-30秒,这不可能在500ms内完成秒表1表示是.

The environment I was running on was linux on a raspberry pi. A write speed test confirms that my write speed to the sd card is just over 1MB/s which would align with the results I'm seeing, 20-30 seconds to write a 30MB file, it couldn't possibly be done in the 500ms that stopwatch 1 says it was.

由于其他原因, File.WriteAllBytes不会阻止

在树莓派上添加一个外部SSD USB HDD并更改为在该位置保存文件后,保存文件仅需0.5秒,问题就消失了.

After adding an external SSD USB HDD to the raspberry pi and changing to save the file there instead, saving the file only takes 0.5 seconds and the problem goes away.

这篇关于C#,实体框架核心&amp;PostgreSQL:插入一行需要20秒钟以上的时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆