R - 将数据帧列表合并到一个数据帧中，并按行排列缺失值 [英] R - merge a list of data frames into one data frame with missing values by row

查看：243 发布时间：2017/3/26 1:14:51 r merge dataframe do.call

本文介绍了R - 将数据帧列表合并到一个数据帧中，并按行排列缺失值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个特殊的.txt文件集合。文件夹，我已经写了一个功能：

列出我想要的文件，然后为每个文件

读取文件

对数据进行子集（仅提取感兴趣的行和列）

对数据

将这些新值添加到列表中。

我最终得到的是具有以下结构的列表：

 > str（DataList）
列表16 
 $：'data.frame'：14 obs。的2个变量：
 .. $ Sample：因子w / 14级别Sample_1A，Sample_1B，..：1 2 3 4 5 6 7 8 9 10 ... 
 .. $ Var1 ：num [1:14] 27.9 33.8 29.9 29.4 28.8 ... 
 $：'data.frame'：14 obs。的2个变量：
 .. $ Sample：因子w / 14级Sample_1A，Sample_1B，..：1 2 3 4 5 6 7 8 9 10 ... 
 .. $ Var2 ：num [1:14] 24.6 27 26.8 26.7 27.2 ... 
 $：'data.frame'：12 obs。的2个变量：
 .. $ Sample：因子w / 14级别Sample_1A，Sample_1B，..：1 2 3 4 5 6 7 9 11 12 ... 
 .. $ Var3 ：num [1:12] 31.4 35.6 34 35.7 32.5 ...

对于每个变量（ Var1，Var2，Var3， ...）我有一个列 Sample 和一列数值。

样本始终是14个级别的因素;这些级别对于每个变量是相同的。

问题是，一些变量（如上面的 Var3 ）没有对每个级别的code>样本。

我想要的是一个包含14行的数据框（每个级别为 Sample ）。第一列应为样本;那么对于每个变量，应该有一个包含相应数值的列，如下所示：

 示例Var1 Var2 Var3 
 Sample_1A 27.9 24.6 31.4 
 Sample_1B 33.8 27 35.6 
 ... 
 Sample_3B 26.8 29.7 NA

我一直在试图用 do.call 这样做，但不知道如何传递由由于缺少值， cbind 不满意。任何想法如何做？

谢谢！

编辑：按照约旦的要求： >

 > dput（DataList [1：3]）
 list（structure（list（Sample = structure（1:14， .Label = c（Sample_1BSample_1C，Sample_1D，Sample_2C，Sample_2A，Sample_2D，Sample_3B，Sample_3C，Sample_3A，Sample_3D，Sample_4B Sample_4C，Sample_4A，Sample_4D），class =factor），Var1 = c（26.9333333333333,29.17,28.9366666666667,28.9233333333333,28.61,28.63,26.7933333333333,34.6633333333333,30.4966666666667,28.4433333333333,27.4533333333333,28.3,27.9633333333333,27.2366666666667 ）），.Names = c（Sample，Var1），row.names = c（NA，-14L），class =data.frame），结构（list（Sample = structure（1:14， .Label = c（Sample_1B，Sample_1C，Sample_1D，Sample_2C，Sample_2A，Sample_2D，Sample_3B，Sample_3C，Sample_3A，Sample_3D，Sample_4B Sample_4C，Sample_4A，Sample_4D），class =factor）， Var2 = c（24.19,26.6033333333333,26.6366666666667,27.6766666666667,27.61,27.5633333333333,25.1566666666667,33.7266666666667,27.7,26.1466666666667,25.65,26.3633333333333,25.5333333333333,26.1733333333333）），Names = c（Sample，Var2）行。名称= c（NA，-14L），class =data.frame），结构（list（Sample = structure（c（1L，2L，3L，4L，5L，6L，7L，9L，11L，12L， ，14L），.Label = c（Sample_1B，Sample_1C，Sample_1D，Sample_2C，Sample_2A，Sample_2D，Sample_3B，Sample_3C，Sample_3A，Sample_3D Sample_4B，Sample_4C，Sample_4A，Sample_4D），class =factor），Var3 = c（31.4133333333333,35.56,33.9666666666667,35.66,32.4633333333333,31.99,31.3133333333333,36.34,34.9433333333333,34.5433333333333,34.3766666666667,33.28 ）），.Names = c（Sample，Var3），row.names = c（NA，-12L），class =data.frame））
  / pre> 
 
解决方案
看起来像减少的教科书用例。 / p> 
 
 
  merge.all<  -  function（x，y）{
 merge（x，y，all = TRUE，by =Sample）
} 
 
输出<  - 减少（merge.all，DataList）
  
 
I have a variation on the oh-so-common problem of how to merge things together in R.  

I have a set of .txt files in a particular folder, and I have written a function that:


makes a list of the files I want, and then for each file
reads the file
subsets the data (to extract just the rows and columns of interest)
does some calculations on the data
adds these new values to a list.  


What I end up with is a list with the following structure:  
>str(DataList)
List of 16
 $ :'data.frame':   14 obs. of  2 variables:
  ..$ Sample: Factor w/ 14 levels "Sample_1A","Sample_1B",..: 1 2 3 4 5 6 7 8 9 10 ...
  ..$ Var1  : num [1:14] 27.9 33.8 29.9 29.4 28.8 ...
 $ :'data.frame':   14 obs. of  2 variables:
  ..$ Sample: Factor w/ 14 levels "Sample_1A","Sample_1B",..: 1 2 3 4 5 6 7 8 9 10 ...
  ..$ Var2  : num [1:14] 24.6 27 26.8 26.7 27.2 ...
 $ :'data.frame':   12 obs. of  2 variables:
  ..$ Sample: Factor w/ 14 levels "Sample_1A","Sample_1B",..: 1 2 3 4 5 6 7 9 11 12 ...
  ..$ Var3  : num [1:12] 31.4 35.6 34 35.7 32.5 ...
For each variable (Var1, Var2, Var3, ...) I have a column Sample and a column of numerical values.  

Sample is always a factor with 14 levels; these levels are the same for each variable.  

The problem is that some variables (like Var3 above) don't have observations for each level of Sample.  

What I want to end up with is a data frame with 14 rows (one for each level of Sample).  The first column should be Sample; then for each variable, there should be a column containing the corresponding numerical values, like so:
Sample     Var1    Var2    Var3
Sample_1A  27.9    24.6    31.4
Sample_1B  33.8    27      35.6
...
Sample_3B  26.8    29.7    NA
I've been trying to do this with do.call, but don't know how to pass the arguments for by; cbind gets unhappy because of the missing values. Any thoughts on how to do this?

Thanks!

EDIT:  As per joran's request:
>dput(DataList[1:3])
list(structure(list(Sample = structure(1:14, .Label = c("Sample_1B", "Sample_1C", "Sample_1D", "Sample_2C", "Sample_2A", "Sample_2D", "Sample_3B", "Sample_3C", "Sample_3A", "Sample_3D", "Sample_4B", "Sample_4C", "Sample_4A", "Sample_4D"), class = "factor"), Var1 = c(26.9333333333333, 29.17, 28.9366666666667, 28.9233333333333,  28.61, 28.63, 26.7933333333333, 34.6633333333333, 30.4966666666667, 28.4433333333333, 27.4533333333333, 28.3, 27.9633333333333, 27.2366666666667)), .Names = c("Sample", "Var1"), row.names = c(NA, -14L), class = "data.frame"), structure(list(Sample = structure(1:14, .Label = c("Sample_1B",  "Sample_1C", "Sample_1D", "Sample_2C", "Sample_2A", "Sample_2D", "Sample_3B", "Sample_3C", "Sample_3A", "Sample_3D", "Sample_4B", "Sample_4C", "Sample_4A", "Sample_4D"), class = "factor"),                                       Var2 = c(24.19, 26.6033333333333, 26.0366666666667, 27.6766666666667, 27.61, 27.5633333333333, 25.1566666666667, 33.7266666666667, 27.7, 26.1466666666667, 25.65, 26.3633333333333, 25.5333333333333, 26.1733333333333)), .Names = c("Sample", "Var2"), row.names = c(NA,  -14L), class = "data.frame"), structure(list(Sample = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 9L, 11L, 12L, 13L, 14L), .Label = c("Sample_1B", "Sample_1C", "Sample_1D", "Sample_2C", "Sample_2A", "Sample_2D", "Sample_3B", "Sample_3C", "Sample_3A", "Sample_3D", "Sample_4B", "Sample_4C", "Sample_4A", "Sample_4D"), class = "factor"), Var3 = c(31.4133333333333, 35.56, 33.9666666666667, 35.66, 32.4633333333333, 31.99, 31.3133333333333, 36.34, 34.9433333333333, 34.5433333333333, 34.3766666666667, 33.28)), .Names = c("Sample",  "Var3"), row.names = c(NA, -12L), class = "data.frame"))

 解决方案 
Looks like a textbook use case for Reduce.
merge.all <- function(x, y) {
    merge(x, y, all=TRUE, by="Sample")
}

output <- Reduce(merge.all, DataList)


                        
这篇关于R  - 将数据帧列表合并到一个数据帧中，并按行排列缺失值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

R - 将数据帧列表合并到一个数据帧中，并按行排列缺失值 [英] R - merge a list of data frames into one data frame with missing values by row

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R - 将数据帧列表合并到一个数据帧中，并按行排列缺失值 [英] R - merge a list of data frames into one data frame with missing values by row

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭