R:如何合并2000 .dat文件,然后添加列标题 [英] R: How to merge 2000 .dat files and then add a column header

查看:119
本文介绍了R:如何合并2000 .dat文件,然后添加列标题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

很抱歉,如果没有回答,请在其他地方.我是R的新手,并且用了整整2天的时间尝试克服这个最初的障碍.

Apologies if this has been answered else where. I'm new to R, and have spent all of my 2 days using it trying to get past this initial hurdle.

已为我提供了一个包含大约2000个独立数据文件的数据集.我想将它们合并到一个非常大的数据集中.我发现了人们建议工作的几种方法,但没有一种对我有用.例如,一个博客(

I've been given a data set with approximately 2000 separate data files. I would like to merge them in to one very large data set. I've found a couple of ways that people suggest work, but none have worked for me. For example, one blog (http://psychwire.wordpress.com/2011/06/03/merge-all-files-in-a-directory-using-r-into-a-single-dataframe/) recommends using the following code:

setwd("target_dir/")

file_list <- list.files()

for (file in file_list){

  # if the merged dataset doesn't exist, create it
  if (!exists("dataset")){
    dataset <- read.table(file, header=TRUE, sep="\t")
  }

  # if the merged dataset does exist, append to it
  if (exists("dataset")){
    temp_dataset <-read.table(file, header=TRUE, sep="\t")
    dataset<-rbind(dataset, temp_dataset)
    rm(temp_dataset)
  }

}

当我使用此代码(将"target_dir"更改为正确的目录)时,R向我显示以下内容:

When I use this code (changing 'target_dir' to the correct directory), R presents me with the following:

Error in match.names(clabs, names(xi)) : 
  names do not match previous names

我的直觉是,我要么没有更改代码中所需的变量之一,以使其与我的特定数据相关(我将"target_dir"更改为正确的目录,但未更改任何内容否则),或者是因为.dat文件没有任何列标题.如果是这样,我的第二个问题是是否存在一种使用R为多个.dat文件创建相同列标题的方法.

My hunch is that I've either not changed one of the variables within the code which I need to so that it relates to my specific data (I changed the 'target_dir' to the correct directory, but didn't change anything else), or it is because the .dat files don't have any column headings. If this is the case, my second question is whether there is a way of creating the same column headings for multiple .dat files using R.

非常感谢您抽出宝贵时间阅读本文章.

Many thanks for taking the time to read this.

推荐答案

尝试一下:

setwd("target_dir/")

file_list <- list.files()

for (file in file_list){

  # if the merged dataset doesn't exist, create it
  if (!exists("dataset")){
    dataset <- read.table(file, header=FALSE, sep="\t", 
               col.names = c("a", "b", "c"))
  }

  # if the merged dataset does exist, append to it
  if (exists("dataset")){
    temp_dataset <-read.table(file, header=FALSE, sep="\t",
    col.names = c("a", "b", "c"))
    dataset<-rbind(dataset, temp_dataset)
    rm(temp_dataset)
  }
}

c("a", "b", "c")替换为要用于列的名称的位置.或者忽略col.names参数,R将使用V1,V2等.

Where you would replace c("a", "b", "c") with the names you want to use for the columns. Or leave out the col.names parameter and R will use V1, V2, etc.

但是,最好不要使用for循环,如注释中所指出.使用lapply读取所有数据帧,并使用do.call(rbind, ...)plyr::rbind.all堆叠已读取的数据帧.

However it is better to not use a for loop, as pointed out in the comment. Use lapply to read in all the dataframes and the do.call(rbind, ...) or plyr::rbind.all to stack up the dataframes you have read.

这篇关于R:如何合并2000 .dat文件,然后添加列标题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆