基于行名重新格式化数据表,以在R中生成新列 [英] Reformat data tables based on row names to generate new columns in R

查看:265
本文介绍了基于行名重新格式化数据表,以在R中生成新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个类似

的数据框。

  m1 m2 m3 
P001.st 60.00 2.0 1
P003.nd 14.30 2.077 1
P003.rt 29.60 2.077 1
P006.st 10.30 2.077 1
P006.nd 79.30 2.077 1
P008.nd 9.16 2.077 1

我想重新格式化表,以便只有第一部分(期间前,即P001,P003等)行名称显示为行名称,并将具有类似名称的每个后续行追加到列。输出应类似于

  m1st m2st m3st m1nd m2nd m3nd m1rt m2rt m3rt 
P001 60.00 2.0 1 0 0 0 0 0 0
P003 0 0 0 14.30 2.077 1 29.60 2.077 1
P006 10.30 2.077 1 79.30 2.077 1 0 0 0
P008 0 0 0 9.16 2.077 1 0 0 0

聚合函数如

  

或者data.table中的方法,如 p>

  setDT(df)[,list(value = list(value)),by = name] 

无法工作,因为row.names不完全相同。任何建议匹配几百行与许多可变子类型(即,周期后:.nd,.st等)。

解决方案

这是另一种方法:

  library(dplyr)
library(tidyr)
(wide< - reshape(df%>%add_rownames()%>%separate(rowname,c(rowname ,id)),
idvar =rowname,
timevar =id,
direction =wide,
sep =))
#rowname m1st m2st m3st m1nd m2nd m3nd m1rt m2rt m3rt
#1 P001 60.0 2.000 1 NA NA NA NA NA NA
#2 P003 NA NA NA 14.30 2.077 1 29.6 2.077 1
#4 P006 10.3 2.077 1 79.30 2.077 1 NA NA NA
#6 P008 NA NA NA 9.16 2.077 1 NA NA NA

wide [is.na(wide)] < - 0
rownames(wide)< - wide [,1]
wide $ rowname< - NULL
wide
#m1st m2st m3st m1nd m2nd m3nd m1rt m2rt m3rt
#P001 60.0 2.000 1 0.00 0.000 0 0.0 0.000 0
#P003 0.0 0.000 0 14.30 2.077 1 29.6 2.077 1
#P006 10.3 2.077 1 79.30 2.077 1 0.0 0.000 0
#P008 0.0 0.000 0 9.16 2.077 1 0.0 0.000 0


I have a data frame that looks like

            m1      m2     m3
 P001.st   60.00   2.0     1
 P003.nd   14.30   2.077   1
 P003.rt   29.60   2.077   1
 P006.st   10.30   2.077   1
 P006.nd   79.30   2.077   1
 P008.nd    9.16   2.077   1

I want to reformat table so that only first part (before period, i.e., P001, P003 etc) of the row name appear as row names and append the each subsequent rows with similar names to columns. The output should look like

         m1st   m2st  m3st  m1nd   m2nd  m3nd   m1rt   m2rt   m3rt
 P001   60.00   2.0     1   0       0      0     0       0      0
 P003   0       0       0   14.30   2.077  1     29.60   2.077  1
 P006   10.30   2.077   1   79.30   2.077  1     0       0      0
 P008   0       0       0    9.16   2.077  1     0       0      0

The aggregate function like

aggregate(value~name, df, I)

or a method from data.table like

setDT(df)[, list(value=list(value)), by=name] 

would not work because row.names are not exactly the same. Any suggestions for matching hundreds of rows with many variable subtypes (i.e, after period: .nd, .st etc).

解决方案

Here's another way to do it:

library(dplyr)
library(tidyr)
(wide <- reshape(df %>% add_rownames() %>% separate(rowname, c("rowname", "id")), 
                 idvar = "rowname", 
                 timevar = "id", 
                 direction = "wide", 
                 sep = ""))
#   rowname m1st  m2st m3st  m1nd  m2nd m3nd m1rt  m2rt m3rt
# 1    P001 60.0 2.000    1    NA    NA   NA   NA    NA   NA
# 2    P003   NA    NA   NA 14.30 2.077    1 29.6 2.077    1
# 4    P006 10.3 2.077    1 79.30 2.077    1   NA    NA   NA
# 6    P008   NA    NA   NA  9.16 2.077    1   NA    NA   NA

wide[is.na(wide)] <- 0
rownames(wide) <- wide[, 1]
wide$rowname <- NULL
wide
#      m1st  m2st m3st  m1nd  m2nd m3nd m1rt  m2rt m3rt
# P001 60.0 2.000    1  0.00 0.000    0  0.0 0.000    0
# P003  0.0 0.000    0 14.30 2.077    1 29.6 2.077    1
# P006 10.3 2.077    1 79.30 2.077    1  0.0 0.000    0
# P008  0.0 0.000    0  9.16 2.077    1  0.0 0.000    0

这篇关于基于行名重新格式化数据表,以在R中生成新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆