R将数据帧转换为输入文件 - 提高性能 [英] R convert data frame to input file - improve performance
问题描述
数据集大约为1500 x 700,需要一段时间来循环直通数据框和我想知道是否有任何方法来加快进程。
我的数据框是这样的:
> train2
得分x1 x2 x3 x4 x5 ... x700
0 0 1 1 1 0 0
1 0 1 0 0 0 0
0 1 0 1 1 1 0
3 0 1 1 1 0 0
1 0 1 0 1 0 0
2 1 1 1 1 0 1
0 0 1 1 0 0 0
...。 。 。 。 。 。
在创建的文件中,我只包含非零的单元格。
因此,第1-3行的输出为:
0 | x2:1 x3:1 x4:1
1 | x2:1
0 | x1:1 x3:1 x4:1
我的当前代码像这样运行:
pt1 < - paste(train2 $ score,|,sep =)
collect1 < - c()$ (列车2)中的
$(b在1:nrow(列车2)中){
word1 < - pt1 [j]
[j,i]!= 0){
word1< - paste(word1,colnames(train2)[i],:,train2 [j,i],,sep =)$ (j %% 100 == 0){
print(j); flush.console(b
$ b)
collect1 < - c(collect1,word1)
()
gc()
}
}
需要3-4分钟。有没有什么明显的提高性能?编辑:循环完成后,产生的数据帧 collect1
用来创建一个文本文件:
write(collect1,file =outPut1.txt)
尝试引导操作,如下所示(我把'score'放在一个单独的变量并从'train3'中删除它,所以我不需要在匿名函数中对数据框进行子集化):
score< ; train2 $ score
train3 < - train2 [,-1]
cols< - colnames(train3)
res< - apply(train3,1,function(x){
idx < - x!= 0
nms < - cols [idx]
vals < - x [idx]
paste(nms,vals,sep =: ,collapse =)
})
out < - 粘贴(score,|,as.vector(res))
print(out)
I'm trying to convert a data frame from R to a text file.
The data set is ~ 1500 x 700 and it takes a while to loop thru the dataframe and I'm wondering if there's any way to speed up the process.
My data frame is like this:
>train2
score x1 x2 x3 x4 x5 ... x700
0 0 1 1 1 0 0
1 0 1 0 0 0 0
0 1 0 1 1 1 0
3 0 1 1 1 0 0
1 0 1 0 1 0 0
2 1 1 1 1 0 1
0 0 1 1 0 0 0
... . . . . . .
In the created file I only include cells that are non-zero.
So the output for row 1-3 would be:
0 | x2:1 x3:1 x4:1
1 | x2:1
0 | x1:1 x3:1 x4:1
My current code runs like this:
pt1 <- paste(train2$score," | ",sep="")
collect1 <- c()
for(j in 1:nrow(train2)){
word1 <- pt1[j]
for(i in 10:ncol(train2)){
if(train2[j,i] !=0){
word1 <- paste(word1,colnames(train2)[i],":",train2[j,i], " ", sep="")
}
}
collect1 <- c(collect1, word1)
if(j %% 100 == 0){
print(j);flush.console()
gc()
}
}
Each run takes ~ 3-4 minutes. Is there anything obvious to improve the performance?
EDIT: after the loops are completed, the resulting data frame collect1
is used to create a text file using:
write(collect1, file="outPut1.txt")
Try vectoring the operation as follows (I put 'score' in a separate variable and removed it from 'train3' so I wouldn't have to subset the data frame in the anonymous function):
score <- train2$score
train3 <- train2[, -1]
cols <- colnames(train3)
res <- apply(train3, 1, function(x) {
idx <- x != 0
nms <- cols[idx]
vals <- x[idx]
paste(nms, vals, sep=":", collapse=" ")
})
out <- paste(score, "|", as.vector(res))
print(out)
这篇关于R将数据帧转换为输入文件 - 提高性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!