如何以长格式的R数据帧的子集进行操作? [英] how to operate with a subset of an R dataframe in long format?

查看:100
本文介绍了如何以长格式的R数据帧的子集进行操作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个3组和3天的数据框架:

  set.seed(10)
dat < - data.frame(group = rep(c(g1,g2,g3),each = 3),day = rep(c(0,2,4),3),value = (9))
#组日值
#1 g1 0 0.507478
#2 g1 2 0.306769
#3 g1 4 0.426908
#4 g2 0 0.693102
#5 g2 2 0.085136
#6 g2 4 0.225437
#7 g3 0 0.274531
#8 g3 2 0.272305
#9 g3 4 0.615829

我想采用log2并将每个值与每个组中的第0个值进行分隔。我现在的做法是通过在中间步骤中计算每个日组:

  day_0<  -  dat [dat $ day == 0,value] 
day_2< - dat [dat $ day == 2,value]
day_4< - dat [dat $ day == 4,value ]
res < - cbind(0,log2(day_2 / day_0),log2(day_4 / day_0))
rownames(res)< - c(g1,g2 g3)
colnames(res)< - c(day_0,log_ratio_day_2_day_0,log_ratio_day_4_day_0)
#day_0 log_ratio_day_2_day_0 log_ratio_day_4_day_0
#g1 0 -0.7261955 -0.249422
#g2 0 -3.0252272 -1.620346
#g3 0 -0.0117427 1.165564

什么是正确的

解决方案

你的朋友是在...之间计算 res 包含 c code code code code code $ c $ require(plyr)
> ddply(dat,。(group),mutate,new_value = log2(value / value [1]))
组日值new_value
1 g1 0 0.50747820 0.00000000
2 g1 2 0.30676851 -0.72619548
3 g1 4 0.42690767 -0.24942179
4 g2 0 0.69310208 0.00000000
5 g2 2 0.08513597 -3.02522716
6 g2 4 0.22543662 -1.62034599
7 g3 0 0.27453052 0.00000000
8 g3 2 0.27230507 -0.01174274
9 g3 4 0.61582931 1.16556397


I have a data frame with 3 groups and 3 days:

set.seed(10)
dat <- data.frame(group=rep(c("g1","g2","g3"),each=3), day=rep(c(0,2,4),3), value=runif(9))
#   group day    value
# 1    g1   0 0.507478
# 2    g1   2 0.306769
# 3    g1   4 0.426908
# 4    g2   0 0.693102
# 5    g2   2 0.085136
# 6    g2   4 0.225437
# 7    g3   0 0.274531
# 8    g3   2 0.272305
# 9    g3   4 0.615829

I want to take the log2 and divide each value with the day 0 value within each group. The way I'm doing it now is by calculating each day group in an intermediate step:

day_0 <- dat[dat$day==0, "value"]
day_2 <- dat[dat$day==2, "value"]
day_4 <- dat[dat$day==4, "value"]
res <- cbind(0, log2(day_2/day_0), log2(day_4/day_0))
rownames(res) <- c("g1","g2","g3")
colnames(res) <- c("day_0","log_ratio_day_2_day_0","log_ratio_day_4_day_0")
#    day_0 log_ratio_day_2_day_0 log_ratio_day_4_day_0
# g1     0            -0.7261955             -0.249422
# g2     0            -3.0252272             -1.620346
# g3     0            -0.0117427              1.165564

What's the proper way of calculating res without an intermediate step?

解决方案

Your friend is ddply from the plyr package:

require(plyr)
> ddply(dat, .(group), mutate, new_value = log2(value / value[1]))
  group day      value   new_value
1    g1   0 0.50747820  0.00000000
2    g1   2 0.30676851 -0.72619548
3    g1   4 0.42690767 -0.24942179
4    g2   0 0.69310208  0.00000000
5    g2   2 0.08513597 -3.02522716
6    g2   4 0.22543662 -1.62034599
7    g3   0 0.27453052  0.00000000
8    g3   2 0.27230507 -0.01174274
9    g3   4 0.61582931  1.16556397

这篇关于如何以长格式的R数据帧的子集进行操作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆