提高大型数据导入性能到SQLite的用C# [英] Improve large data import performance into SQLite with C#

查看:1529
本文介绍了提高大型数据导入性能到SQLite的用C#的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用C#导入CSV与6-8million行。

我的表是这样的:

  CREATE TABLE [数据]([ID] VARCHAR(100)NULL,[RAW] VARCHAR(200)NULL)
CREATE INDEX IDLookup ON数据(ID ASC)
 

我使用 System.Data.SQLite 做进口。

目前做到600万行的取2分55秒在Windows 7 32位,酷睿2 2.8GHz的&放大器; 4GB内存。这不是太糟糕,但我只是想知道如果任何人都可以看到更快的导入它的方法。

下面是我的code:

 公共类数据
{
  公共字符串IDDATA {获得;组; }
  公共字符串RAWDATA {获得;组; }
}

字符串的connectionString = @数据源=+ Path.GetFullPath(AppDomain.CurrentDomain.BaseDirectory +\\ dbimport);
System.Data.SQLite.SQLiteConnection康恩=新System.Data.SQLite.SQLiteConnection(的connectionString);
conn.Open();

//删除并重新创建表似乎是最快的方式得到旧数据删除
System.Data.SQLite.SQLiteCommand命令=新System.Data.SQLite.SQLiteCommand(康涅狄格州);
command.CommandText =DROP TABLE数据;
command.ExecuteNonQuery();
command.CommandText = @CREATE TABLE [数据]([ID] VARCHAR(100)NULL,[RAW] VARCHAR(200)NULL);
command.ExecuteNonQuery();
command.CommandText =CREATE INDEX IDLookup ON数据(ID ASC);
command.ExecuteNonQuery();

字符串insertText =INSERT INTO数据(身份证,RAW)VALUES(@ P0,P1 @);

SQLiteTransaction反式= conn.BeginTransaction();
command.Transaction =反;

command.CommandText = insertText;
秒表SW =新的秒表();
sw.Start();
使用(CsvReader CSV =新CsvReader(新的StreamReader(@C:\ DATA.TXT),FALSE))
{
   变种F = csv.Select(X =>新建数据(){IDDATA = X [27],RAWDATA =的string.join(,,x.Take(24))});

   的foreach(f中VAR项)
   {
      command.Parameters.AddWithValue(@ P0,item.IDData);
      command.Parameters.AddWithValue(@ P1,item.RawData);
      command.ExecuteNonQuery();
   }
 }
 trans.Commit();
 sw.Stop();
 的Debug.WriteLine(sw.Elapsed.Minutes +闽(S)+ sw.Elapsed.Seconds +秒(S));
 conn.Close();
 

解决方案

这是相当快6万条记录。

看来你正在做正确的方式,前一段时间我读过关于sqlite.org在插入记录时,你需要把这些刀片内幕交易,如果你不这样做,你的刀片将受到限制只有每秒60!这是因为每个刀片将被视为一个单独的交易,每笔交易必须等待盘完全旋转。你可以在这里阅读完整的说明:

​​ http://www.sqlite.org/faq.html#q19

  

其实,SQLite的将很容易的平均桌面计算机上做每秒5万以上的INSERT语句。但它只会每秒做了几十个交易。交易速度是由磁盘驱动器的转速限制。事务通常需要的盘片,这7200磁盘驱动器上限制到了约60交易每秒两个完整的旋转。

您比较时间与平均如上所述:50000每秒=>应该采取2米00秒。这是只比你的时间快一点。

  

交易速度由硬盘驱动器速度的限制,因为(默认情况下)的SQLite实际上等待,直到真正的数据安全地存储在磁盘表面上的交易完成之前。这样一来,如果你突然失去动力,或者如果你的操作系统崩溃,您的数据仍然是安全的。有关详细信息,请阅读有关原子提交SQLite中。

     

在默认情况下,每一个INSERT语句是它自己的事务。但是,如果你包围多个INSERT语句与BEGIN ... COMMIT那么所有的刀片组合成一个单一的交易。需要提交事务的时间进行摊销所有的封闭插入语句,因此每个INSERT语句中的时间大大缩短。

有一些暗示在接下来的一段,你可以尝试加快插入:

  

另一种方法是运行PRAGMA同步= OFF。此命令将导致SQLite的不等待数据到达磁盘表面上,这将使写操作似乎更快。但是,如果你失去了权力在交易的中间,你的数据库文件可能会去损坏。

我始终认为,SQLite的是专为简单的事情,600万的记录,在我看来,对于像MySQL的一些真正的数据库服务器的工作。

计数的记录表中​​的SQLite中有这么多的记录可能需要很长的时间,只为您的信息,而不是使用SELECT COUNT(*),你总是可以使用SELECT MAX(ROWID),这是非常快的,而不是如此精确,如果你是在该表中删除记录。

编辑。

由于迈克伍德豪斯说,创建索引你插入后,记录应加快整个事情,那就是在其他数据库的共同意见,但不能说肯定它是如何工作的SQLite的。

I am using C# to import a CSV with 6-8million rows.

My table looks like this:

CREATE TABLE [Data] ([ID] VARCHAR(100)  NULL,[Raw] VARCHAR(200)  NULL)
CREATE INDEX IDLookup ON Data(ID ASC)

I am using System.Data.SQLite to do the import.

Currently to do 6 millions rows its taking 2min 55 secs on a Windows 7 32bit, Core2Duo 2.8Ghz & 4GB RAM. That's not too bad but I was just wondering if anyone could see a way of importing it quicker.

Here is my code:

public class Data
{
  public string IDData { get; set; }
  public string RawData { get; set; }
}   

string connectionString = @"Data Source=" + Path.GetFullPath(AppDomain.CurrentDomain.BaseDirectory + "\\dbimport");
System.Data.SQLite.SQLiteConnection conn = new System.Data.SQLite.SQLiteConnection(connectionString);
conn.Open();

//Dropping and recreating the table seems to be the quickest way to get old data removed
System.Data.SQLite.SQLiteCommand command = new System.Data.SQLite.SQLiteCommand(conn);
command.CommandText = "DROP TABLE Data";
command.ExecuteNonQuery();
command.CommandText = @"CREATE TABLE [Data] ([ID] VARCHAR(100)  NULL,[Raw] VARCHAR(200)  NULL)";
command.ExecuteNonQuery();
command.CommandText = "CREATE INDEX IDLookup ON Data(ID ASC)";
command.ExecuteNonQuery();

string insertText = "INSERT INTO Data (ID,RAW) VALUES(@P0,@P1)";

SQLiteTransaction trans = conn.BeginTransaction();
command.Transaction = trans;

command.CommandText = insertText;
Stopwatch sw = new Stopwatch();
sw.Start();
using (CsvReader csv = new CsvReader(new StreamReader(@"C:\Data.txt"), false))
{
   var f = csv.Select(x => new Data() { IDData = x[27], RawData = String.Join(",", x.Take(24)) });

   foreach (var item in f)
   {
      command.Parameters.AddWithValue("@P0", item.IDData);
      command.Parameters.AddWithValue("@P1", item.RawData);
      command.ExecuteNonQuery();
   }
 }
 trans.Commit();
 sw.Stop();
 Debug.WriteLine(sw.Elapsed.Minutes + "Min(s) " + sw.Elapsed.Seconds + "Sec(s)");
 conn.Close();

解决方案

This is quite fast for 6 million records.

It seems that you are doing it the right way, some time ago I've read on sqlite.org that when inserting records you need to put these inserts inside transaction, if you don't do this your inserts will be limited to only 60 per second! That is because each insert will be treated as a separate transaction and each transaction must wait for the disk to rotate fully. You can read full explanation here:

http://www.sqlite.org/faq.html#q19

Actually, SQLite will easily do 50,000 or more INSERT statements per second on an average desktop computer. But it will only do a few dozen transactions per second. Transaction speed is limited by the rotational speed of your disk drive. A transaction normally requires two complete rotations of the disk platter, which on a 7200RPM disk drive limits you to about 60 transactions per second.

Comparing your time vs Average stated above: 50,000 per second => that should take 2m 00 sec. Which is only little faster than your time.

Transaction speed is limited by disk drive speed because (by default) SQLite actually waits until the data really is safely stored on the disk surface before the transaction is complete. That way, if you suddenly lose power or if your OS crashes, your data is still safe. For details, read about atomic commit in SQLite..

By default, each INSERT statement is its own transaction. But if you surround multiple INSERT statements with BEGIN...COMMIT then all the inserts are grouped into a single transaction. The time needed to commit the transaction is amortized over all the enclosed insert statements and so the time per insert statement is greatly reduced.

There is some hint in next paragraph that you could try to speed up the inserts:

Another option is to run PRAGMA synchronous=OFF. This command will cause SQLite to not wait on data to reach the disk surface, which will make write operations appear to be much faster. But if you lose power in the middle of a transaction, your database file might go corrupt.

I always thought that SQLite was designed for "simple things", 6 millions of records seems to me is a job for some real database server like MySQL.

Counting records in a table in SQLite with so many records can take long time, just for your information, instead of using SELECT COUNT(*), you can always use SELECT MAX(rowid) which is very fast, but is not so accurate if you were deleting records in that table.

EDIT.

As Mike Woodhouse stated, creating the index after you inserted the records should speed up the whole thing, that is a common advice in other databases, but can't say for sure how it works in SQLite.

这篇关于提高大型数据导入性能到SQLite的用C#的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆