读入多个.dat文件 [英] r read in multiple .dat-files

查看:168
本文介绍了读入多个.dat文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你好我是新来的,在R里面是初学者,

我的问题:
在我有多个文件的情况下(test1.dat, test2.dat,...)在R中使用我使用这段代码在

  filelist<  -  list .files(pattern =* .dat)

df_list< - lapply(filelist,function(x)read.table(x,header = FALSE,sep =,
,colClasses =factor,comment.char =,
col.names =raw))

现在我遇到了问题,我的数据很大,我找到了一个解决方案来加速使用sqldf-package:

  sql<  -  file(test2.dat)
df< - sqldf(select * from sql,dbname = tempfile(),
file.format = list header = FALSE,row.names = FALSE,colClasses =factor,
comment.char =,col.names =raw))



这对于一个文件来说工作的很好,但是我不能像第一个代码片段那样改变代码来读入多个文件。有人能帮我吗?谢谢! Momo

解决方案

这似乎工作(但我假设有一个更快的 sql

  sql.l<  -  lapply(filelist,file)

df_list2 < - lapply(sql.l,function(i)sqldf(select * from i),
dbname = tempfile(),file.format = list(header = TRUE,row.names = FALSE)) )




速度 - 部分来自mnel的帖子快速阅读在R中使用大表作为数据框

  library(data.table)
library(sqldf)

测试数据
n = 1e6
DT = data.table(a = sample(1:1000,n,replace = TRUE),
b = sample(1:1000 ,n,replace = TRUE),
c = rnorm(n),
d = sample(c(foo,bar,baz,qux,quux), replace = TRUE),
e = rnorm(n),
f = sample(1:1000,n,replace = TRUE))

#写出5个文件
lapply(1:5,function i)write.table(DT,paste0(test,i,.dat),
sep =,,row.names = FALSE,quote = FALSE))

阅读: data.table



<$ p $文件列表 filelist < - list.files(pattern =* .dat)

system.time(df_list< - lapply(filelist,fread))

#user system已过期
#5.244 0.200 5.457

阅读: sqldf

  sql.l<  -  lapply(filelist,file)

system.time(df_list2< - lapply(sql.l,function(i)sqldf(select * from i),
dbname = tempfile(),file.format = list(header = TRUE,row.names = FALSE))))

#已用用户系统
#35.594 1.432 37.357

检查 - 除了属性外似乎没问题

  all。相等(df_list,df_list2)


Hi I am new here and a beginner in R,

My problem: in the case i have more than one file (test1.dat, test2.dat,...) to work with in R i use this code to read them in

filelist <- list.files(pattern = "*.dat")

df_list <- lapply(filelist, function(x) read.table(x, header = FALSE, sep = ","
                                               ,colClasses = "factor", comment.char = "", 
                                               col.names = "raw"))

Now i have the problem that my data is big, i found a solution to speed things up using the sqldf-package :

sql <- file("test2.dat")
df <- sqldf("select * from sql", dbname = tempfile(),
                    file.format = list(header = FALSE, row.names = FALSE, colClasses = "factor", 
                                       comment.char = "", col.names ="raw"))

it is working well for one file but i am not able to change the code to read-in multiple files like in the first code snippet. can someone help me? Thank you! Momo

解决方案

This seems to work (but i assume there is a quicker sql way to this)

sql.l <- lapply(filelist , file)

df_list2 <- lapply(sql.l, function(i) sqldf("select * from i" ,  
    dbname = tempfile(),  file.format = list(header = TRUE, row.names = FALSE)))


Look at speeds - partially taken from mnel's post Quickly reading very large tables as dataframes in R

library(data.table)
library(sqldf)

# test data
n=1e6
DT = data.table( a=sample(1:1000,n,replace=TRUE),
                 b=sample(1:1000,n,replace=TRUE),
                 c=rnorm(n),
                 d=sample(c("foo","bar","baz","qux","quux"),n,replace=TRUE),
                 e=rnorm(n),
                 f=sample(1:1000,n,replace=TRUE) )

# write 5 files out
lapply(1:5, function(i) write.table(DT,paste0("test", i, ".dat"), 
                                 sep=",",row.names=FALSE,quote=FALSE))

read: data.table

filelist <- list.files(pattern = "*.dat")

system.time(df_list <- lapply(filelist, fread))

#  user  system elapsed 
# 5.244   0.200   5.457 

read: sqldf

sql.l <- lapply(filelist , file)

 system.time(df_list2 <- lapply(sql.l, function(i) sqldf("select * from i" ,  
   dbname = tempfile(),  file.format = list(header = TRUE, row.names = FALSE))))

#    user  system elapsed 
#  35.594   1.432  37.357 

Check - seems ok except for attributes

all.equal(df_list , df_list2)

这篇关于读入多个.dat文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆