导入文件夹中的所有txt文件,连接到数据框中,使用文件名作为R中的变量? [英] Import all txt files in folder, concatenate into data frame, use file names as variable in R?

查看:27
本文介绍了导入文件夹中的所有txt文件,连接到数据框中,使用文件名作为R中的变量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含 142 个制表符分隔的文本文件的文件夹.每个文件有 19 个变量,然后是下面的一些行(通常不超过 30 行,但会有所不同).我想在 R 中自动对这些文件做几件事,但我似乎无法用我的代码得到我想要的东西.我是循环的新手,我从 stackoverflow 上的帖子中获得了这两个部分的代码,但似乎无法弄清楚如何组合它们的功能.

I have a folder with 142 tab-delimited text files. Each file has 19 variables, and then a number of rows beneath (usually no more than 30 rows, but it varies). I want to do several things with these files in R automatically, and I can't seem to get exactly what I want with my code. I am new to loops, I got both sections of code from previous posts here at stackoverflow but can't seem to figure out how to combine their functions.

  1. 我想把文件读入R的时候把文件名变成一个变量,这样每一行都有标识文件名

  1. I want to turn the filename into a variable when reading the files into R, so that each row has the identifying file name

将所有文件(带有文件名变量且没有标题)连接到一个尺寸为 Yx19 的数据帧中,其中 Y=结果行的数量.

Concatenate all files (with filename variable and no header) into one dataframe with dimensions Yx19, where Y=however many resulting rows there are.

我可以使用以下代码创建 142 个数据帧的列表:

I am able to create a list of the 142 dataframes using this code:

myFiles = list.files(path="~/Documents/ForR/", pattern="*.txt")
data <- lapply(myFiles, read.table, sep="	", header=FALSE)
names(data) <- myFiles
    for(i in myFiles) 
    data[[i]]$Source = i
    do.call(rbind, data)

我可以用 19 个变量创建我想要的数据框,但文件名不存在:

I am able to create the dataframe I want with 19 variables, but the filename is not present:

files <- list.files(path="~/Documents/ForR/.", pattern=".txt")
    DF <- NULL
        for (f in files) {
        dat <- read.csv(f, header=F, sep="	", na.strings="", colClasses="character")
        DF <- rbind(DF, dat)
    }

如何将文件名(如果可能,不带 .txt)作为变量添加到循环中?

How do I add the file name (without .txt if possible) as a variable to the loop?

推荐答案

添加到循环dat$file <- unlist(strsplit(f,split=".",fixed=T))[1]

add to the loop dat$file <- unlist(strsplit(f,split=".",fixed=T))[1]

files <- list.files(path="~/Documents/ForR/.", pattern=".txt")
    DF <- NULL
        for (f in files) {
        dat <- read.csv(f, header=F, sep="	", na.strings="", colClasses="character")
        dat$file <- unlist(strsplit(f,split=".",fixed=T))[1]
        DF <- rbind(DF, dat)
    }

do.call 中的 row.names 不应该采用 names(list)[n].i 格式,其中 i 是 1:number_of_rows_for_data.frame n?所以你可以从 row.names 中创建一列

Shouldn't the row.names from the do.call be in the format names(list)[n].i where i is 1:number_of_rows_for_data.frame n? so you can just make a column from the row.names

data <- lapply(myFiles, read.table, sep="	", header=FALSE)
combined.data <- do.call(rbind, data)
combined.data$file_origin <- row.names(combined.data)

这篇关于导入文件夹中的所有txt文件,连接到数据框中,使用文件名作为R中的变量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆