写大量的记录(BULK INSERT)在.NET中访问/ C# [英] Writing large number of records (bulk insert) to Access in .NET/C#

查看:1294
本文介绍了写大量的记录(BULK INSERT)在.NET中访问/ C#的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

什么是执行批量插入到从.NET MS Access数据库的最佳方式?使用ADO.NET,它走的方式了一个多小时写出来一个大的数据集。

What is the best way to perform bulk inserts into an MS Access database from .NET? Using ADO.NET, it is taking way over an hour to write out a large dataset.

请注意,我原来的职位,以前我重构了,有两个问题,答案在问题中的一部分。我把伊戈尔Turman的建议,并重新写了两部分 - 上面的问题,其次是我的答案

推荐答案

我发现,在一个特定的方式使用DAO比使用ADO.NET快约30倍。我的共享code和结果这个答案。作为背景,在下面,测试是写出100 000记录的表的20列。

I found that using DAO in a specific manner is roughly 30 times faster than using ADO.NET. I am sharing the code and results in this answer. As background, in the below, the test is to write out 100 000 records of a table with 20 columns.

技术和时代的总结 - 从最好到更糟:

A summary of the technique and times - from best to worse:


  1. 02.8秒:使用DAO,使用 DAO.Field 的引用表列

  2. 02.8秒:写出到一个文本文件,用自动化的文本导入访问

  3. 11.0秒:使用DAO,使用列索引引用的表列

  4. 17.0秒:使用DAO,通过名称引用列

  5. 79.0秒:使用ADO.NET,生成的每一行INSERT语句

  6. 86.0秒:使用ADO.NET,使用DataTable的为批的DataAdapter的插入

  1. 02.8 seconds: Use DAO, use DAO.Field's to refer to the table columns
  2. 02.8 seconds: Write out to a text file, use Automation to import the text into Access
  3. 11.0 seconds: Use DAO, use the column index to refer to the table columns.
  4. 17.0 seconds: Use DAO, refer to the column by name
  5. 79.0 seconds: Use ADO.NET, generate INSERT statements for each row
  6. 86.0 seconds: Use ADO.NET, use DataTable to an DataAdapter for "batch" insert

作为背景,偶尔我需要执行的相当大量的数据分析,我发现,Access是最好的平台。该分析涉及许多查询,往往很多VBA code的。

As background, occasionally I need to perform analysis of reasonably large amounts of data, and I find that Access is the best platform. The analysis involves many queries, and often a lot of VBA code.

由于种种原因,我想用C#来代替VBA。典型的方法是使用OLEDB连接到访问。我用了一个 OleDbDataReader 抓取数百万条记录,和它的工作相当不错。但是,输出结果表时,花了很长很长的时间。一个多小时。

For various reasons, I wanted to use C# instead of VBA. The typical way is to use OleDB to connect to Access. I used an OleDbDataReader to grab millions of records, and it worked quite well. But when outputting results to a table, it took a long, long time. Over an hour.

首先,让我们讨论了两种典型的方式来记录写入到从C#访问。两种方式都涉及到OLEDB和ADO.NET。首先是要产生的INSERT语句逐个时间,并执行它们,以79秒为100 000的记录。在code是:

First, let's discuss the two typical ways to write records to Access from C#. Both ways involve OleDB and ADO.NET. The first is to generate INSERT statements one at time, and execute them, taking 79 seconds for the 100 000 records. The code is:

public static double TestADONET_Insert_TransferToAccess()
{
  StringBuilder names = new StringBuilder();
  for (int k = 0; k < 20; k++)
  {
    string fieldName = "Field" + (k + 1).ToString();
    if (k > 0)
    {
      names.Append(",");
    }
    names.Append(fieldName);
  }

  DateTime start = DateTime.Now;
  using (OleDbConnection conn = new OleDbConnection(Properties.Settings.Default.AccessDB))
  {
    conn.Open();
    OleDbCommand cmd = new OleDbCommand();
    cmd.Connection = conn;

    cmd.CommandText = "DELETE FROM TEMP";
    int numRowsDeleted = cmd.ExecuteNonQuery();
    Console.WriteLine("Deleted {0} rows from TEMP", numRowsDeleted);

    for (int i = 0; i < 100000; i++)
    {
      StringBuilder insertSQL = new StringBuilder("INSERT INTO TEMP (")
        .Append(names)
        .Append(") VALUES (");

      for (int k = 0; k < 19; k++)
      {
        insertSQL.Append(i + k).Append(",");
      }
      insertSQL.Append(i + 19).Append(")");
      cmd.CommandText = insertSQL.ToString();
      cmd.ExecuteNonQuery();
    }
    cmd.Dispose();
  }
  double elapsedTimeInSeconds = DateTime.Now.Subtract(start).TotalSeconds;
  Console.WriteLine("Append took {0} seconds", elapsedTimeInSeconds);
  return elapsedTimeInSeconds;
}

请注意,我没有发现任何方法访问,允许批量插入。

Note that I found no method in Access that allows a bulk insert.

我有话想,也许使用数据表数据适配器将被证明是有用的。尤其是因为我以为我可以使用数据适配器的 UpdateBatchSize 属性在做批量插入。但是,显然只有SQL Server和Oracle支持,和访问没有。并花了86秒的时间最长。在code我用的是:

I had then thought that maybe using a data table with a data adapter would be prove useful. Especially since I thought that I could do batch inserts using the UpdateBatchSize property of a data adapter. However, apparently only SQL Server and Oracle support that, and Access does not. And it took the longest time of 86 seconds. The code I used was:

public static double TestADONET_DataTable_TransferToAccess()
{
  StringBuilder names = new StringBuilder();
  StringBuilder values = new StringBuilder();
  DataTable dt = new DataTable("TEMP");
  for (int k = 0; k < 20; k++)
  {
    string fieldName = "Field" + (k + 1).ToString();
    dt.Columns.Add(fieldName, typeof(int));
    if (k > 0)
    {
      names.Append(",");
      values.Append(",");
    }
    names.Append(fieldName);
    values.Append("@" + fieldName);
  }

  DateTime start = DateTime.Now;
  OleDbConnection conn = new OleDbConnection(Properties.Settings.Default.AccessDB);
  conn.Open();
  OleDbCommand cmd = new OleDbCommand();
  cmd.Connection = conn;

  cmd.CommandText = "DELETE FROM TEMP";
  int numRowsDeleted = cmd.ExecuteNonQuery();
  Console.WriteLine("Deleted {0} rows from TEMP", numRowsDeleted);

  OleDbDataAdapter da = new OleDbDataAdapter("SELECT * FROM TEMP", conn);

  da.InsertCommand = new OleDbCommand("INSERT INTO TEMP (" + names.ToString() + ") VALUES (" + values.ToString() + ")");
  for (int k = 0; k < 20; k++)
  {
    string fieldName = "Field" + (k + 1).ToString();
    da.InsertCommand.Parameters.Add("@" + fieldName, OleDbType.Integer, 4, fieldName);
  }
  da.InsertCommand.UpdatedRowSource = UpdateRowSource.None;
  da.InsertCommand.Connection = conn;
  //da.UpdateBatchSize = 0;

  for (int i = 0; i < 100000; i++)
  {
    DataRow dr = dt.NewRow();
    for (int k = 0; k < 20; k++)
    {
      dr["Field" + (k + 1).ToString()] = i + k;
    }
    dt.Rows.Add(dr);
  }
  da.Update(dt);
  conn.Close();

  double elapsedTimeInSeconds = DateTime.Now.Subtract(start).TotalSeconds;
  Console.WriteLine("Append took {0} seconds", elapsedTimeInSeconds);
  return elapsedTimeInSeconds;
}

然后我试图非标准的方式。首先,我写了一个文本文件,然后使用自动导入,在这快 - 2.8秒 - 并列第一名。但我认为这是脆弱的,原因如下:日期允许输出领域是棘手的。我只好给他们专门格式化( someDate.ToString(YYYY-MM-DD HH:MM)),然后成立了专门的导入规格的codeS这种格式。进口规范还必须有引用分隔符设置正确。在下面的例子中,只有整型字段,没有必要导入规格。

Then I tried non-standard ways. First, I wrote out to a text file, and then used Automation to import that in. This was fast - 2.8 seconds - and tied for first place. But I consider this fragile for a number of reasons: Outputing date fields is tricky. I had to format them specially (someDate.ToString("yyyy-MM-dd HH:mm")), and then set up a special "import specification" that codes in this format. The import specification also had to have the "quote" delimiter set right. In the example below, with only integer fields, there was no need for an import specification.

文本文件也为国际化那里是十进制分隔符,不同的日期格式,可以使用UNI code的使用逗号的脆弱。

Text files are also fragile for "internationalization" where there is a use of comma's for decimal separators, different date formats, possible the use of unicode.

请注意第一条记录包含字段名称,以便列顺序是不依赖于表,我们使用自动化做文本文件的实际进口。

Notice that the first record contains the field names so that the column order isn't dependent on the table, and that we used Automation to do the actual import of the text file.

public static double TestTextTransferToAccess()
{
  StringBuilder names = new StringBuilder();
  for (int k = 0; k < 20; k++)
  {
    string fieldName = "Field" + (k + 1).ToString();
    if (k > 0)
    {
      names.Append(",");
    }
    names.Append(fieldName);
  }

  DateTime start = DateTime.Now;
  StreamWriter sw = new StreamWriter(Properties.Settings.Default.TEMPPathLocation);

  sw.WriteLine(names);
  for (int i = 0; i < 100000; i++)
  {
    for (int k = 0; k < 19; k++)
    {
      sw.Write(i + k);
      sw.Write(",");
    }
    sw.WriteLine(i + 19);
  }
  sw.Close();

  ACCESS.Application accApplication = new ACCESS.Application();
  string databaseName = Properties.Settings.Default.AccessDB
    .Split(new char[] { ';' }).First(s => s.StartsWith("Data Source=")).Substring(12);

  accApplication.OpenCurrentDatabase(databaseName, false, "");
  accApplication.DoCmd.RunSQL("DELETE FROM TEMP");
  accApplication.DoCmd.TransferText(TransferType: ACCESS.AcTextTransferType.acImportDelim,
  TableName: "TEMP",
  FileName: Properties.Settings.Default.TEMPPathLocation,
  HasFieldNames: true);
  accApplication.CloseCurrentDatabase();
  accApplication.Quit();
  accApplication = null;

  double elapsedTimeInSeconds = DateTime.Now.Subtract(start).TotalSeconds;
  Console.WriteLine("Append took {0} seconds", elapsedTimeInSeconds);
  return elapsedTimeInSeconds;
}

最后,我试过DAO。很多网站在那里提供有关使用DAO巨大的警告。然而,事实证明,这仅仅是为了获取和.NET之间交互的最佳方法,尤其是当你需要编写出大量的记录。此外,它可以访问一个表中的所有属性。我读的地方,它的最简单的编程使用DAO,而不是ADO.NET的交易。

Finally, I tried DAO. Lots of sites out there give huge warnings about using DAO. However, it turns out that it is simply the best way to interact between Access and .NET, especially when you need to write out large number of records. Also, it gives access to all the properties of a table. I read somewhere that it's easiest to program transactions using DAO instead of ADO.NET.

请注意,有code的几行被注释。他们将很快解释。

Notice that there are several lines of code that are commented. They will be explained soon.

public static double TestDAOTransferToAccess()
{

  string databaseName = Properties.Settings.Default.AccessDB
    .Split(new char[] { ';' }).First(s => s.StartsWith("Data Source=")).Substring(12);

  DateTime start = DateTime.Now;
  DAO.DBEngine dbEngine = new DAO.DBEngine();
  DAO.Database db = dbEngine.OpenDatabase(databaseName);

  db.Execute("DELETE FROM TEMP");

  DAO.Recordset rs = db.OpenRecordset("TEMP");

  DAO.Field[] myFields = new DAO.Field[20];
  for (int k = 0; k < 20; k++) myFields[k] = rs.Fields["Field" + (k + 1).ToString()];

  //dbEngine.BeginTrans();
  for (int i = 0; i < 100000; i++)
  {
    rs.AddNew();
    for (int k = 0; k < 20; k++)
    {
      //rs.Fields[k].Value = i + k;
      myFields[k].Value = i + k;
      //rs.Fields["Field" + (k + 1).ToString()].Value = i + k;
    }
    rs.Update();
    //if (0 == i % 5000)
    //{
      //dbEngine.CommitTrans();
      //dbEngine.BeginTrans();
    //}
  }
  //dbEngine.CommitTrans();
  rs.Close();
  db.Close();

  double elapsedTimeInSeconds = DateTime.Now.Subtract(start).TotalSeconds;
  Console.WriteLine("Append took {0} seconds", elapsedTimeInSeconds);
  return elapsedTimeInSeconds;
}

在此code,我们创建DAO.Field变量为每列( myFields [K] ),然后用它们。花了2.8秒。或者,可以直接访问这些字段作为注释行找到 rs.Fields [字段+(K + 1)的ToString()]值= I + K; 这就增加了时间到17秒。在一个事务包裹code(见注释行)下降,为14秒。使用整数索引 rs.Fields [K] .value的= I + K; droppped了11秒。使用DAO.Field( myFields [K] )和交易实际花费的时间,增加时间至3.1秒。

In this code, we created DAO.Field variables for each column (myFields[k]) and then used them. It took 2.8 seconds. Alternatively, one could directly access those fields as found in the commented line rs.Fields["Field" + (k + 1).ToString()].Value = i + k; which increased the time to 17 seconds. Wrapping the code in a transaction (see the commented lines) dropped that to 14 seconds. Using an integer index rs.Fields[k].Value = i + k; droppped that to 11 seconds. Using the DAO.Field (myFields[k]) and a transaction actually took longer, increasing the time to 3.1 seconds.

最后,完整性,这一切code的是一个简单的静态类,而使用语句是:

Lastly, for completeness, all of this code was in a simple static class, and the using statements are:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using ACCESS = Microsoft.Office.Interop.Access; // USED ONLY FOR THE TEXT FILE METHOD
using DAO = Microsoft.Office.Interop.Access.Dao; // USED ONLY FOR THE DAO METHOD
using System.Data; // USED ONLY FOR THE ADO.NET/DataTable METHOD
using System.Data.OleDb; // USED FOR BOTH ADO.NET METHODS
using System.IO;  // USED ONLY FOR THE TEXT FILE METHOD

这篇关于写大量的记录(BULK INSERT)在.NET中访问/ C#的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆