R - 将数据帧列表合并到一个数据帧中,并按行排列缺失值 [英] R - merge a list of data frames into one data frame with missing values by row

查看:243
本文介绍了R - 将数据帧列表合并到一个数据帧中,并按行排列缺失值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我有一个特殊的.txt文件集合。文件夹,我已经写了一个功能:




  • 列出我想要的文件,然后为每个文件

  • 读取文件

  • 对数据进行子集(仅提取感兴趣的行和列)

  • 对数据

  • 将这些新值添加到列表中。



我最终得到的是具有以下结构的列表:

 > str(DataList)
列表16
$:'data.frame':14 obs。的2个变量:
.. $ Sample:因子w / 14级别Sample_1A,Sample_1B,..:1 2 3 4 5 6 7 8 9 10 ...
.. $ Var1 :num [1:14] 27.9 33.8 29.9 29.4 28.8 ...
$:'data.frame':14 obs。的2个变量:
.. $ Sample:因子w / 14级Sample_1A,Sample_1B,..:1 2 3 4 5 6 7 8 9 10 ...
.. $ Var2 :num [1:14] 24.6 27 26.8 26.7 27.2 ...
$:'data.frame':12 obs。的2个变量:
.. $ Sample:因子w / 14级别Sample_1A,Sample_1B,..:1 2 3 4 5 6 7 9 11 12 ...
.. $ Var3 :num [1:12] 31.4 35.6 34 35.7 32.5 ...

对于每个变量( Var1,Var2,Var3, ...)我有一个列 Sample 和一列数值。



样本始终是14个级别的因素;这些级别对于每个变量是相同的。



问题是,一些变量(如上面的 Var3 )没有对每个级别的code>样本。



我想要的是一个包含14行的数据框(每个级别为 Sample ) 。第一列应为样本;那么对于每个变量,应该有一个包含相应数值的列,如下所示:

 示例Var1 Var2 Var3 
Sample_1A 27.9 24.6 31.4
Sample_1B 33.8 27 35.6
...
Sample_3B 26.8 29.7 NA

我一直在试图用 do.call 这样做,但不知道如何传递由于缺少值, cbind 不满意。任何想法如何做?



谢谢!



编辑:按照约旦的要求: >

 > dput(DataList [1:3])
list(structure(list(Sample = structure(1:14, .Label = c(Sample_1BSample_1C,Sample_1D,Sample_2C,Sample_2A,Sample_2D,Sample_3B,Sample_3C,Sample_3A,Sample_3D,Sample_4B Sample_4C,Sample_4A,Sample_4D),class =factor),Var1 = c(26.9333333333333,29.17,28.9366666666667,28.9233333333333,28.61,28.63,26.7933333333333,34.6633333333333,30.4966666666667,28.4433333333333,27.4533333333333,28.3,27.9633333333333,27.2366666666667 )),.Names = c(Sample,Var1),row.names = c(NA,-14L),class =data.frame),结构(list(Sample = structure(1:14, .Label = c(Sample_1B,Sample_1C,Sample_1D,Sample_2C,Sample_2A,Sample_2D,Sample_3B,Sample_3C,Sample_3A,Sample_3D,Sample_4B Sample_4C,Sample_4A,Sample_4D),class =factor), Var2 = c(24.19,26.6033333333333,26.6366666666667,27.6766666666667,27.61,27.5633333333333,25.1566666666667,33.7266666666667,27.7,26.1466666666667,25.65,26.3633333333333,25.5333333333333,26.1733333333333)),Names = c(Sample,Var2)行。名称= c(NA,-14L),class =data.frame),结构(list(Sample = structure(c(1L,2L,3L,4L,5L,6L,7L,9L,11L,12L, ,14L),.Label = c(Sample_1B,Sample_1C,Sample_1D,Sample_2C,Sample_2A,Sample_2D,Sample_3B,Sample_3C,Sample_3A,Sample_3D Sample_4B,Sample_4C,Sample_4A,Sample_4D),class =factor),Var3 = c(31.4133333333333,35.56,33.9666666666667,35.66,32.4633333333333,31.99,31.3133333333333,36.34,34.9433333333333,34.5433333333333,34.3766666666667,33.28 )),.Names = c(Sample,Var3),row.names = c(NA,-12L),class =data.frame))
/ pre>

解决方案

看起来像减少的教科书用例。 / p>

  merge.all<  -  function(x,y){
merge(x,y,all = TRUE,by =Sample)
}

输出< - 减少(merge.all,DataList)


I have a variation on the oh-so-common problem of how to merge things together in R.

I have a set of .txt files in a particular folder, and I have written a function that:

  • makes a list of the files I want, and then for each file
  • reads the file
  • subsets the data (to extract just the rows and columns of interest)
  • does some calculations on the data
  • adds these new values to a list.

What I end up with is a list with the following structure:

>str(DataList)
List of 16
 $ :'data.frame':   14 obs. of  2 variables:
  ..$ Sample: Factor w/ 14 levels "Sample_1A","Sample_1B",..: 1 2 3 4 5 6 7 8 9 10 ...
  ..$ Var1  : num [1:14] 27.9 33.8 29.9 29.4 28.8 ...
 $ :'data.frame':   14 obs. of  2 variables:
  ..$ Sample: Factor w/ 14 levels "Sample_1A","Sample_1B",..: 1 2 3 4 5 6 7 8 9 10 ...
  ..$ Var2  : num [1:14] 24.6 27 26.8 26.7 27.2 ...
 $ :'data.frame':   12 obs. of  2 variables:
  ..$ Sample: Factor w/ 14 levels "Sample_1A","Sample_1B",..: 1 2 3 4 5 6 7 9 11 12 ...
  ..$ Var3  : num [1:12] 31.4 35.6 34 35.7 32.5 ...

For each variable (Var1, Var2, Var3, ...) I have a column Sample and a column of numerical values.

Sample is always a factor with 14 levels; these levels are the same for each variable.

The problem is that some variables (like Var3 above) don't have observations for each level of Sample.

What I want to end up with is a data frame with 14 rows (one for each level of Sample). The first column should be Sample; then for each variable, there should be a column containing the corresponding numerical values, like so:

Sample     Var1    Var2    Var3
Sample_1A  27.9    24.6    31.4
Sample_1B  33.8    27      35.6
...
Sample_3B  26.8    29.7    NA

I've been trying to do this with do.call, but don't know how to pass the arguments for by; cbind gets unhappy because of the missing values. Any thoughts on how to do this?

Thanks!

EDIT: As per joran's request:

>dput(DataList[1:3])
list(structure(list(Sample = structure(1:14, .Label = c("Sample_1B", "Sample_1C", "Sample_1D", "Sample_2C", "Sample_2A", "Sample_2D", "Sample_3B", "Sample_3C", "Sample_3A", "Sample_3D", "Sample_4B", "Sample_4C", "Sample_4A", "Sample_4D"), class = "factor"), Var1 = c(26.9333333333333, 29.17, 28.9366666666667, 28.9233333333333,  28.61, 28.63, 26.7933333333333, 34.6633333333333, 30.4966666666667, 28.4433333333333, 27.4533333333333, 28.3, 27.9633333333333, 27.2366666666667)), .Names = c("Sample", "Var1"), row.names = c(NA, -14L), class = "data.frame"), structure(list(Sample = structure(1:14, .Label = c("Sample_1B",  "Sample_1C", "Sample_1D", "Sample_2C", "Sample_2A", "Sample_2D", "Sample_3B", "Sample_3C", "Sample_3A", "Sample_3D", "Sample_4B", "Sample_4C", "Sample_4A", "Sample_4D"), class = "factor"),                                       Var2 = c(24.19, 26.6033333333333, 26.0366666666667, 27.6766666666667, 27.61, 27.5633333333333, 25.1566666666667, 33.7266666666667, 27.7, 26.1466666666667, 25.65, 26.3633333333333, 25.5333333333333, 26.1733333333333)), .Names = c("Sample", "Var2"), row.names = c(NA,  -14L), class = "data.frame"), structure(list(Sample = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 9L, 11L, 12L, 13L, 14L), .Label = c("Sample_1B", "Sample_1C", "Sample_1D", "Sample_2C", "Sample_2A", "Sample_2D", "Sample_3B", "Sample_3C", "Sample_3A", "Sample_3D", "Sample_4B", "Sample_4C", "Sample_4A", "Sample_4D"), class = "factor"), Var3 = c(31.4133333333333, 35.56, 33.9666666666667, 35.66, 32.4633333333333, 31.99, 31.3133333333333, 36.34, 34.9433333333333, 34.5433333333333, 34.3766666666667, 33.28)), .Names = c("Sample",  "Var3"), row.names = c(NA, -12L), class = "data.frame"))

解决方案

Looks like a textbook use case for Reduce.

merge.all <- function(x, y) {
    merge(x, y, all=TRUE, by="Sample")
}

output <- Reduce(merge.all, DataList)

这篇关于R - 将数据帧列表合并到一个数据帧中,并按行排列缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆