如何在R中将2D数据帧“拉平"或“折叠"为1D数据帧? [英] How to “flatten” or “collapse” a 2D data frame into a 1D data frame in R?

查看:27
本文介绍了如何在R中将2D数据帧“拉平"或“折叠"为1D数据帧?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个二维表,其中R(从csv导入)在data.frame中具有距离:

I have a two dimensional table with distances in a data.frame in R (imported from csv):

           CP000036   CP001063      CP001368
CP000036      0           a            b
CP001063      a           0            c
CP001368      b           c            0

我想展平"它.我在第一个col中有一个轴的值​​,在第二个col中有另一个轴的值​​,然后在第三个col中有距离:

I'd like to "flatten" it. that I have one axes's value in the first col, and the other axes's value in the second col, and then the distance in the third col:

Genome1      Genome2       Dist
CP000036     CP001063       a
CP000036     CP001368       b
CP001063     CP001368       c

上面是理想的,但是重复进行以使输入矩阵中的每个单元格都有自己的行是完全可以的:

Above is ideal, but it would be completely fine to have repetition such that each cell in the input matrix has it's own row:

Genome1      Genome2       Dist
CP000036     CP000036       0
CP000036     CP001063       a
CP000036     CP001368       b
CP001063     CP000036       a
CP001063     CP001063       0
CP001063     CP001368       c
CP001368     CP000036       b
CP001368     CP001063       c
CP001368     CP001368       0

这是一个示例3x3矩阵,但是我的数据集I大得多(大约2000x2000).我会在Excel中执行此操作,但是我需要大约300万行用于输出,而Excel的最大值是大约100万行.

Here is an example 3x3 matrix, but my dataset I is much larger (about 2000x2000). I would do this in Excel, but I need ~3 million rows for the output, whereas Excel's maximum is ~1 million.

这个问题与如何将2D Excel表平化"或折叠"为1D?"

This question is very similar to "How to "flatten" or "collapse" a 2D Excel table into 1D?"1

推荐答案

所以这是使用 reshape2 包中的 melt 的一种解决方案:

So this is one solution using melt from the package reshape2:

dm <- 
  data.frame( CP000036 = c( "0", "a", "b" ),
              CP001063 = c( "a", "0", "c" ),
              CP001368 = c( "b", "c", "0" ),
              stringsAsFactors = FALSE,
              row.names = c( "CP000036", "CP001063", "CP001368" ) )

# assuming the distance follows a metric we avoid everything below and on the diagonal
dm[ lower.tri( dm, diag = TRUE ) ]  <- NA
dm$Genome1 <- rownames( dm )

# finally melt and avoid the entries below the diagonal with na.rm = TRUE
library(reshape2) 
dm.molten <- melt( dm, na.rm= TRUE, id.vars="Genome1",
                   value.name="Dist", variable.name="Genome2" )

print( dm.molten )
   Genome1  Genome2 Dist
4 CP000036 CP001063    a
7 CP000036 CP001368    b
8 CP001063 CP001368    c

可能有更多高性能的解决方案,但我喜欢这一解决方案,因为它简单明了.

Probably there are more performant solutions but I like this one because it's plain and simple.

这篇关于如何在R中将2D数据帧“拉平"或“折叠"为1D数据帧?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆