Rsqlite 需要数小时才能将表写入 sqlite 数据库 [英] Rsqlite takes hours to write table to sqlite database

查看:32
本文介绍了Rsqlite 需要数小时才能将表写入 sqlite 数据库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个简单的 R 程序,它从 sqlite 数据库中读取一个表(1000000 行,10 列)到一个 R data.table 中,然后我对数据进行一些操作并尝试将它写回到一个新表中相同的sqlite数据库.读取数据需要几秒钟,但将表写回 sqlite 数据库需要几个小时.不知道具体多久了,因为一直没做完,我试过的最长是8小时.

I have this simple R program that reads a table (1000000 rows, 10 columns) from a sqlite database into an R data.table and then I do some operations on the data and try to write it back into a new table of the same sqlite database. Reading the data takes a few seconds but writing the table back into the sqlite database takes hours. I don't know how long exactly because it has never finished, the longest I have tried is 8 hours.

这是程序的简化版本:

  library(DBI)
  library(RSQLite)
  library(data.table)

  driver = dbDriver("SQLite")
  con = dbConnect(driver, dbname = "C:/Database/DB.db")

  DB <- data.table(dbGetQuery(con, "SELECT * from Table1"))  

  dbSendQuery(con, "DROP TABLE IF EXISTS Table2")
  dbWriteTable(con, "Table2", DB)
  dbDisconnect(con)
  dbUnloadDriver(driver)

我使用的是 R 版本 2.15.2,包版本是:

Im using R version 2.15.2, package version are:

data.table_1.8.6 RSQLite_0.11.2 DBI_0.2-5

data.table_1.8.6 RSQLite_0.11.2 DBI_0.2-5

我在多个系统和不同的 Windows 版本上尝试过,在所有情况下,将这个表写入 sqlite 数据库都需要花费大量的时间.查看 sqlite 数据库的文件大小时,它的写入速度约为每分钟 50KB.

I have tried on multiple systems and on different Windows versions and in all cases it takes an incredible amount of time to write this table into the sqlite database. When looking at the file size of the sqlite database it writes at about 50KB per minute.

我的问题是有人知道是什么导致写入速度变慢吗?

My question is does anybody know what causes this slow write speed?

Tim 有答案,但我无法标记它,因为它在评论中.

Tim had the answer but I can't flag it as such because it is in the comments.

如:想法避免在使用 dbWriteTable 在 SQLite 数据库中保存 R 数据表时达到内存限制我将数据分块写入数据库

As in: ideas to avoid hitting memory limit when using dbWriteTable to save an R data table inside a SQLite database I wrote the data to the database in chunks

  chunks <- 100

  starts.stops <- floor( seq( 1 , nrow( DB ) , length.out = chunks ) )

  system.time({
    for ( i in 2:( length( starts.stops ) )  ){

      if ( i == 2 ){
        rows.to.add <- ( starts.stops[ i - 1 ] ):( starts.stops[ i ] )
      } else {
        rows.to.add <- ( starts.stops[ i - 1 ] + 1 ):( starts.stops[ i ] )
      }

      dbWriteTable( con , 'Table2' , DB[ rows.to.add , ] , append = TRUE )
    }
  })

需要:

   user  system elapsed 
   4.49    9.90  214.26 

完成将数据写入数据库的时间.显然,我在不知情的情况下达到了内存限制.

time to finish writing the data to the database. Apparantly I was hitting the memory limit without knowing it.

推荐答案

对所有记录使用单个事务(提交).添加一个

Use a single transaction (commit) for all the records. Add a

dbSendQuery(con, "BEGIN")

在插入和一个之前

dbSendQuery(con, "END")

完成.快得多.

这篇关于Rsqlite 需要数小时才能将表写入 sqlite 数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆