提取不同文件的特定列并将它们放在R中的一个大文件中 [英] Extracting specific column of different files and put them together in one big file in R

查看:92
本文介绍了提取不同文件的特定列并将它们放在R中的一个大文件中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有 100 个文件,其中我想提取包含 100.000 行的第 4 列 (total_volume) 并将其放在 1 个大文件中,然后该文件包含 100 列,每列 100.000 行.我正在尝试使用以下脚本进行操作:

I have 100 files, of which I want to extract the 4th column (total_volume) containing 100.000 rows and put it together in 1 big file which then contains 100 columns with each 100.000 rows. I was trying something with the following script:

setwd("/run/media/mydirectory")
library(data.table)
fileNames <- Sys.glob("*.txt.csv")
#read file in fileNames
for (fileName in fileNames) {
dataDf <- read.delim(fileName, header = FALSE) 
# remove columns with only example values 
dataDf <- dataDf[, -(7:14)] 
# convert data frame to data table 
dataDt <- data.table(dataDf)
# set column names
setnames(dataDt, c("mcs", "cell_type", "cell_number", "total_volume"))
#new file with only total volume
total_volume <- dataDt$total_volume
#export file
write.table(dataDt$total_volume, file = "total_volume20.csv")

但我得到的是,所有列都与一个 .csv 文件叠加,结果只有最后一个文件的第 4 列.我希望列彼此相邻而不是叠加.我怎么能那样做?提前致谢!

But what I get then is that all columns get superimposed with as result a .csv file with the 4th column of only the last file. I would like the columns to be next to eachother instead of being superimposed. How could I do that? Thanks in advance!

附言显然,覆盖的事情发生是因为我使用了一个循环.但是,我不确定如何将所有内容组合在一起,因此非常欢迎您提出建议!

P.S. Obviously the overwriting thing happens because I used a loop. However, I am not sure how else to combine everything, so suggestions are very welcome!

推荐答案

您没有给我们一个可重现的示例,所以我无法正确测试,但这应该会给您一个表格,其中包含一列总体积来自您通过调用 Sys.glob() 获得的每个文件.我们的想法是制作一个函数,可以对一个文件执行您想要的操作;使用 lapply() 为目标环境中的每个文件制作一个包含该函数结果的列表;然后cbind将该列表中的列合并到一个大表中.

You haven't given us a reproducible example, so I can't test this properly, but this should give you a table with one column for total volume from each of the files you get from the call to Sys.glob(). The idea is to make a function that does what you want for one file; use lapply() to make a list with the results of that function for each file in your target environment; then cbind the columns in that list into one big table.

setwd("/run/media/mydirectory")
library(data.table)
fileNames <- Sys.glob("*.txt.csv")

# For the function, I'm reproducing your code. You could do in fewer lines and without
# data.table if you like, but maybe there's a reason you chose this approach.
extractor <- function(fileName) {
    require(data.table)
    dataDf <- read.delim(fileName, header = FALSE) 
    dataDf <- dataDf[, -(7:14)] 
    dataDt <- data.table(dataDf)
    setnames(dataDt, c("mcs", "cell_type", "cell_number", "total_volume"))
    total_volume <- dataDt$total_volume
    return(total_volume)
}

total.list <- lapply(fileNames, extractor)
total.table <- Reduce(cbind, total.list)
write.table(total.table, file = "total_volume20.csv")

或者,如果您愿意,可以在一行中完成最后一点:

Or do that last bit in one line if you like:

write.table(Reduce(cbind, lapply(Sys.glob("*.txt.csv"), extractor)), file="total_volume20.csv")

这篇关于提取不同文件的特定列并将它们放在R中的一个大文件中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆