将大量记录(批量插入)写入 .NET/C# 中的 Access [英] Writing large number of records (bulk insert) to Access in .NET/C#

查看:29
本文介绍了将大量记录(批量插入)写入 .NET/C# 中的 Access的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从 .NET 向 MS Access 数据库执行批量插入的最佳方法是什么?使用 ADO.NET,写出一个大型数据集需要一个多小时.

What is the best way to perform bulk inserts into an MS Access database from .NET? Using ADO.NET, it is taking way over an hour to write out a large dataset.

请注意,在我重构"它之前,我的原始帖子在问题部分既有问题又有答案.我接受了 Igor Turman 的建议,并将其分为两部分重新编写 - 上面的问题和我的回答.

推荐答案

我发现以特定方式使用 DAO 大约比使用 ADO.NET 快 30 倍.我正在分享这个答案中的代码和结果.作为背景,下面的测试是写出一个 20 列的表的 100 000 条记录.

I found that using DAO in a specific manner is roughly 30 times faster than using ADO.NET. I am sharing the code and results in this answer. As background, in the below, the test is to write out 100 000 records of a table with 20 columns.

技术和时间的总结 - 从最好到最坏:

A summary of the technique and times - from best to worse:

  1. 02.8 秒:使用DAO,使用DAO.Field的来引用表列
  2. 02.8 秒: 写入文本文件,使用自动化将文本导入 Access
  3. 11.0 秒: 使用 DAO,使用列索引来引用表列.
  4. 17.0 秒:使用 DAO,按名称引用列
  5. 79.0 秒:使用 ADO.NET,为每一行生成 INSERT 语句
  6. 86.0 秒:使用 ADO.NET,使用 DataTable 到 DataAdapter 进行批量"插入
  1. 02.8 seconds: Use DAO, use DAO.Field's to refer to the table columns
  2. 02.8 seconds: Write out to a text file, use Automation to import the text into Access
  3. 11.0 seconds: Use DAO, use the column index to refer to the table columns.
  4. 17.0 seconds: Use DAO, refer to the column by name
  5. 79.0 seconds: Use ADO.NET, generate INSERT statements for each row
  6. 86.0 seconds: Use ADO.NET, use DataTable to an DataAdapter for "batch" insert

作为背景,我偶尔需要对相当大量的数据进行分析,我发现 Access 是最好的平台.分析涉及很多查询,通常还有很多 VBA 代码.

As background, occasionally I need to perform analysis of reasonably large amounts of data, and I find that Access is the best platform. The analysis involves many queries, and often a lot of VBA code.

出于各种原因,我想使用 C# 而不是 VBA.典型的方式是使用OleDB连接Access.我使用了 OleDbDataReader 来抓取数百万条记录,而且效果很好.但是当把结果输出到表格中时,却花费了很长很长的时间.一个多小时.

For various reasons, I wanted to use C# instead of VBA. The typical way is to use OleDB to connect to Access. I used an OleDbDataReader to grab millions of records, and it worked quite well. But when outputting results to a table, it took a long, long time. Over an hour.

首先,让我们讨论从 C# 向 Access 写入记录的两种典型方法.这两种方式都涉及 OleDB 和 ADO.NET.第一种是一次生成一个 INSERT 语句并执行它们,100 000 条记录需要 79 秒.代码是:

First, let's discuss the two typical ways to write records to Access from C#. Both ways involve OleDB and ADO.NET. The first is to generate INSERT statements one at time, and execute them, taking 79 seconds for the 100 000 records. The code is:

public static double TestADONET_Insert_TransferToAccess()
{
  StringBuilder names = new StringBuilder();
  for (int k = 0; k < 20; k++)
  {
    string fieldName = "Field" + (k + 1).ToString();
    if (k > 0)
    {
      names.Append(",");
    }
    names.Append(fieldName);
  }

  DateTime start = DateTime.Now;
  using (OleDbConnection conn = new OleDbConnection(Properties.Settings.Default.AccessDB))
  {
    conn.Open();
    OleDbCommand cmd = new OleDbCommand();
    cmd.Connection = conn;

    cmd.CommandText = "DELETE FROM TEMP";
    int numRowsDeleted = cmd.ExecuteNonQuery();
    Console.WriteLine("Deleted {0} rows from TEMP", numRowsDeleted);

    for (int i = 0; i < 100000; i++)
    {
      StringBuilder insertSQL = new StringBuilder("INSERT INTO TEMP (")
        .Append(names)
        .Append(") VALUES (");

      for (int k = 0; k < 19; k++)
      {
        insertSQL.Append(i + k).Append(",");
      }
      insertSQL.Append(i + 19).Append(")");
      cmd.CommandText = insertSQL.ToString();
      cmd.ExecuteNonQuery();
    }
    cmd.Dispose();
  }
  double elapsedTimeInSeconds = DateTime.Now.Subtract(start).TotalSeconds;
  Console.WriteLine("Append took {0} seconds", elapsedTimeInSeconds);
  return elapsedTimeInSeconds;
}

请注意,我在 Access 中没有发现允许批量插入的方法.

Note that I found no method in Access that allows a bulk insert.

然后我想也许使用带有数据适配器的数据表会被证明是有用的.特别是因为我认为我可以使用数据适配器的 UpdateBatchSize 属性进行批量插入.但是,显然只有 SQL Server 和 Oracle 支持,而 Access 不支持.最长的时间是 86 秒.我使用的代码是:

I had then thought that maybe using a data table with a data adapter would be prove useful. Especially since I thought that I could do batch inserts using the UpdateBatchSize property of a data adapter. However, apparently only SQL Server and Oracle support that, and Access does not. And it took the longest time of 86 seconds. The code I used was:

public static double TestADONET_DataTable_TransferToAccess()
{
  StringBuilder names = new StringBuilder();
  StringBuilder values = new StringBuilder();
  DataTable dt = new DataTable("TEMP");
  for (int k = 0; k < 20; k++)
  {
    string fieldName = "Field" + (k + 1).ToString();
    dt.Columns.Add(fieldName, typeof(int));
    if (k > 0)
    {
      names.Append(",");
      values.Append(",");
    }
    names.Append(fieldName);
    values.Append("@" + fieldName);
  }

  DateTime start = DateTime.Now;
  OleDbConnection conn = new OleDbConnection(Properties.Settings.Default.AccessDB);
  conn.Open();
  OleDbCommand cmd = new OleDbCommand();
  cmd.Connection = conn;

  cmd.CommandText = "DELETE FROM TEMP";
  int numRowsDeleted = cmd.ExecuteNonQuery();
  Console.WriteLine("Deleted {0} rows from TEMP", numRowsDeleted);

  OleDbDataAdapter da = new OleDbDataAdapter("SELECT * FROM TEMP", conn);

  da.InsertCommand = new OleDbCommand("INSERT INTO TEMP (" + names.ToString() + ") VALUES (" + values.ToString() + ")");
  for (int k = 0; k < 20; k++)
  {
    string fieldName = "Field" + (k + 1).ToString();
    da.InsertCommand.Parameters.Add("@" + fieldName, OleDbType.Integer, 4, fieldName);
  }
  da.InsertCommand.UpdatedRowSource = UpdateRowSource.None;
  da.InsertCommand.Connection = conn;
  //da.UpdateBatchSize = 0;

  for (int i = 0; i < 100000; i++)
  {
    DataRow dr = dt.NewRow();
    for (int k = 0; k < 20; k++)
    {
      dr["Field" + (k + 1).ToString()] = i + k;
    }
    dt.Rows.Add(dr);
  }
  da.Update(dt);
  conn.Close();

  double elapsedTimeInSeconds = DateTime.Now.Subtract(start).TotalSeconds;
  Console.WriteLine("Append took {0} seconds", elapsedTimeInSeconds);
  return elapsedTimeInSeconds;
}

然后我尝试了非标准的方法.首先,我写出一个文本文件,然后使用自动化将其导入.这很快 - 2.8 秒 - 并列第一.但我认为这很脆弱,原因有很多: 输出日期字段很棘手.我必须对它们进行特殊格式化(someDate.ToString("yyyy-MM-dd HH:mm")),然后设置一个特殊的导入规范"以这种格式进行编码.导入规范还必须正确设置引用"分隔符.在下面的示例中,只有整数字段,不需要导入规范.

Then I tried non-standard ways. First, I wrote out to a text file, and then used Automation to import that in. This was fast - 2.8 seconds - and tied for first place. But I consider this fragile for a number of reasons: Outputing date fields is tricky. I had to format them specially (someDate.ToString("yyyy-MM-dd HH:mm")), and then set up a special "import specification" that codes in this format. The import specification also had to have the "quote" delimiter set right. In the example below, with only integer fields, there was no need for an import specification.

文本文件对于国际化"也很脆弱,其中使用逗号作为小数点分隔符、不同的日期格式、可能使用 unicode.

Text files are also fragile for "internationalization" where there is a use of comma's for decimal separators, different date formats, possible the use of unicode.

请注意,第一条记录包含字段名称,因此列顺序不依赖于表,并且我们使用自动化来执行文本文件的实际导入.

Notice that the first record contains the field names so that the column order isn't dependent on the table, and that we used Automation to do the actual import of the text file.

public static double TestTextTransferToAccess()
{
  StringBuilder names = new StringBuilder();
  for (int k = 0; k < 20; k++)
  {
    string fieldName = "Field" + (k + 1).ToString();
    if (k > 0)
    {
      names.Append(",");
    }
    names.Append(fieldName);
  }

  DateTime start = DateTime.Now;
  StreamWriter sw = new StreamWriter(Properties.Settings.Default.TEMPPathLocation);

  sw.WriteLine(names);
  for (int i = 0; i < 100000; i++)
  {
    for (int k = 0; k < 19; k++)
    {
      sw.Write(i + k);
      sw.Write(",");
    }
    sw.WriteLine(i + 19);
  }
  sw.Close();

  ACCESS.Application accApplication = new ACCESS.Application();
  string databaseName = Properties.Settings.Default.AccessDB
    .Split(new char[] { ';' }).First(s => s.StartsWith("Data Source=")).Substring(12);

  accApplication.OpenCurrentDatabase(databaseName, false, "");
  accApplication.DoCmd.RunSQL("DELETE FROM TEMP");
  accApplication.DoCmd.TransferText(TransferType: ACCESS.AcTextTransferType.acImportDelim,
  TableName: "TEMP",
  FileName: Properties.Settings.Default.TEMPPathLocation,
  HasFieldNames: true);
  accApplication.CloseCurrentDatabase();
  accApplication.Quit();
  accApplication = null;

  double elapsedTimeInSeconds = DateTime.Now.Subtract(start).TotalSeconds;
  Console.WriteLine("Append took {0} seconds", elapsedTimeInSeconds);
  return elapsedTimeInSeconds;
}

最后,我尝试了 DAO.许多网站都给出了关于使用 DAO 的巨大警告.然而,事实证明它是 Access 和 .NET 之间交互的最佳方式,尤其是当您需要写出大量记录时.此外,它还可以访问表的所有属性.我在某处读到使用 DAO 而不是 ADO.NET 对事务进行编程是最容易的.

Finally, I tried DAO. Lots of sites out there give huge warnings about using DAO. However, it turns out that it is simply the best way to interact between Access and .NET, especially when you need to write out large number of records. Also, it gives access to all the properties of a table. I read somewhere that it's easiest to program transactions using DAO instead of ADO.NET.

请注意,有几行代码被注释.他们很快就会解释.

Notice that there are several lines of code that are commented. They will be explained soon.

public static double TestDAOTransferToAccess()
{

  string databaseName = Properties.Settings.Default.AccessDB
    .Split(new char[] { ';' }).First(s => s.StartsWith("Data Source=")).Substring(12);

  DateTime start = DateTime.Now;
  DAO.DBEngine dbEngine = new DAO.DBEngine();
  DAO.Database db = dbEngine.OpenDatabase(databaseName);

  db.Execute("DELETE FROM TEMP");

  DAO.Recordset rs = db.OpenRecordset("TEMP");

  DAO.Field[] myFields = new DAO.Field[20];
  for (int k = 0; k < 20; k++) myFields[k] = rs.Fields["Field" + (k + 1).ToString()];

  //dbEngine.BeginTrans();
  for (int i = 0; i < 100000; i++)
  {
    rs.AddNew();
    for (int k = 0; k < 20; k++)
    {
      //rs.Fields[k].Value = i + k;
      myFields[k].Value = i + k;
      //rs.Fields["Field" + (k + 1).ToString()].Value = i + k;
    }
    rs.Update();
    //if (0 == i % 5000)
    //{
      //dbEngine.CommitTrans();
      //dbEngine.BeginTrans();
    //}
  }
  //dbEngine.CommitTrans();
  rs.Close();
  db.Close();

  double elapsedTimeInSeconds = DateTime.Now.Subtract(start).TotalSeconds;
  Console.WriteLine("Append took {0} seconds", elapsedTimeInSeconds);
  return elapsedTimeInSeconds;
}

在这段代码中,我们为每一列 (myFields[k]) 创建了 DAO.Field 变量,然后使用它们.耗时 2.8 秒.或者,可以直接访问注释行 rs.Fields["Field" + (k + 1).ToString()].Value = i + k; 中的那些字段,这增加了时间到 17 秒.将代码包装在事务中(请参阅注释行)将其缩短到 14 秒.使用整数索引 rs.Fields[k].Value = i + k; 将其降低到 11 秒.使用 DAO.Field (myFields[k]) 和交易实际上需要更长的时间,将时间增加到 3.1 秒.

In this code, we created DAO.Field variables for each column (myFields[k]) and then used them. It took 2.8 seconds. Alternatively, one could directly access those fields as found in the commented line rs.Fields["Field" + (k + 1).ToString()].Value = i + k; which increased the time to 17 seconds. Wrapping the code in a transaction (see the commented lines) dropped that to 14 seconds. Using an integer index rs.Fields[k].Value = i + k; droppped that to 11 seconds. Using the DAO.Field (myFields[k]) and a transaction actually took longer, increasing the time to 3.1 seconds.

最后,为了完整起见,所有这些代码都在一个简单的静态类中,using 语句是:

Lastly, for completeness, all of this code was in a simple static class, and the using statements are:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using ACCESS = Microsoft.Office.Interop.Access; // USED ONLY FOR THE TEXT FILE METHOD
using DAO = Microsoft.Office.Interop.Access.Dao; // USED ONLY FOR THE DAO METHOD
using System.Data; // USED ONLY FOR THE ADO.NET/DataTable METHOD
using System.Data.OleDb; // USED FOR BOTH ADO.NET METHODS
using System.IO;  // USED ONLY FOR THE TEXT FILE METHOD

这篇关于将大量记录(批量插入)写入 .NET/C# 中的 Access的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆