最有效的方法来替换R中数据帧中的最低列表值 [英] Most efficient way to replace lowest list values in dataframe in R

查看:99
本文介绍了最有效的方法来替换R中数据帧中的最低列表值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框df,每两个重复一个测试项目记录每个主题的列表/向量。

  subj item rep vec 
s1 1 1 [2,1,4,5,8,4,7]
s1 1 2 [1,1,3,4,7,5,3]
s1 2 1 [6,5,4,1,2,5,5]
s1 2 2 [4,4,4,0,1,4,3]
s2 1 1 [4,6,8,7,7,5,8]
s2 1 2 [2,5, 4,5,8,1,4]
s2 2 1 [9,3,2,6,6,8,5]
s2 2 2 [7,1,2,3,2, 7,3]

对于每个项目,我想找到代表1的平均值的50%,然后替换rep 2向量中最低的数字为0,直到rep2的平均值小于或等于rep1的平均值。例如,对于s1项目1:

  mean(c(2,1,4,5,8,4,7)) * 0.5 = 2.1#rep1缩小
mean(c(1,1,3,4,7,5,3))= 3.4#rep2
mean(c(0,0,0,0 ,7,5,0))= 1.7 #new rep2,使得平均值(rep2)<= mean(rep1)

删除rep 2向量中的最小数字后,我想关联rep1和rep2向量并执行一些其他小算术函数,并将结果附加到另一个(长度初始化)数据帧。现在,我正在使用与这个伪代码类似的循环:

 对于subj中的subj:
项目项目:
while mean(rep2)> mean(rep1)* 0.5:
rep2 = replace(最低(rep2),0)
newDataFrame [i] = correl(rep1,rep2)
pre>

使用循环看起来真的效率不高;在R中,是否有更有效的方法来查找和替换列表/向量中的最低值,直到该方法小于或等于取决于该特定项目的值?对于其他数据框架,附加相关性和其他结果的最佳方式是什么?



附加信息:



pre> > dput(df)
> structure(list(subj = structure(c(1L,1L,1L,1L,2L,2L,2L,
2L),.Label = c(s1,s2),class =factor),item = c(1L,
1L,2L,2L,1L,1L,2L, 2L),rep = c(1L,2L,1L,2L,1L,2L,
1L,2L),vec = list(c(2,1,4,5,8,4,7),c (1,1,3,4,7,
5,3),c(6,5,4,1,2,5,5),c(4,4,4,0,1,4 ,3),c(4,6,
8,7,7,5,8),c(2,5,4,5,8,1,4),c(9,3,2, 6,6,8,5
),c(7,1,2,3,2,7,3))),.Names = c(subj,item,rep,
vec),row.names = c(NA,-8L),class =data.frame)

我想要这个数据框作为输出(使用rep1与rep2相关,rep1与新的rep2相关)。

 子项目origCorrel newCorrel 
s1 1 .80 .51
s1 2 .93 .34
s2 1 .56 .40
s2 2 .86 .79


解决方案

摆脱循环的一个典型策略是使所有的计算都在子集上数据转换成自己的函数,然后在聚合应用函数中调用该函数。

  two.cors = function(x,ratio = .5){
rep1 = unlist(x [1,] [[vec'] ])
rep2 = unlist(x [2,] [[vec']])
orig.cor = cor(rep1,rep2)
while(mean(rep2)> mean(rep1)* ratio){
rep2 [which(rep2 == min(rep2 [which(!rep2 == 0)]))] = 0
}
c(orig.cor ,wierd.cor = cor(rep1,rep2))
}

我想使用daply所以得到 plyr ,可以使用聚合或基础 *应用函数

 库(plyr)

然后调用数据集的功能

  daply(df,c(subj,item),.fun = function )two.cors(x,ratio = .4))

此输出可以重新格式化,但我离开对你而言,因为我认为你需要额外的统计数据从 two.cors 函数


I have a dataframe, df, with a list/vector of numbers recorded for each subject for two repetitions of a test item.

subj item rep vec
s1 1 1 [2,1,4,5,8,4,7]
s1 1 2 [1,1,3,4,7,5,3]
s1 2 1 [6,5,4,1,2,5,5]
s1 2 2 [4,4,4,0,1,4,3]
s2 1 1 [4,6,8,7,7,5,8]
s2 1 2 [2,5,4,5,8,1,4]
s2 2 1 [9,3,2,6,6,8,5]
s2 2 2 [7,1,2,3,2,7,3]

For each item, I want find 50% the mean of rep 1 and then replace the lowest numbers in the rep 2 vector with 0, until the mean of rep2 is less than or equal to the mean of rep1. For example, for s1 item1:

mean(c(2,1,4,5,8,4,7))*0.5 = 2.1 #rep1 scaled down
mean(c(1,1,3,4,7,5,3)) = 3.4 #rep2
mean(c(0,0,0,0,7,5,0)) = 1.7 #new rep2 such that mean(rep2) <= mean(rep1)

After removing the lowest numbers in rep 2 vector, I want to correlate the rep1 and rep2 vectors and perform some other minor arithmetic functions and append the results to another (length initialized) dataframe. For now, I'm doing this with loops similar to this pseudo code:

for subj in subjs:
  for item in items:
     while mean(rep2) > mean(rep1)*0.5:
       rep2 = replace(lowest(rep2),0)
     newDataFrame[i] = correl(rep1,rep2)

Doing this with loops seems really inefficient; in R, is there a more efficient way to find and replace the lowest values in a list/vector until the means are less than or equal to a value that depends on that specific item? And what's the best way to append correlations and other results to other dataframes?

Additional info:

>dput(df)
>structure(list(subj = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
 2L), .Label = c("s1", "s2"), class = "factor"), item = c(1L, 
 1L, 2L, 2L, 1L, 1L, 2L, 2L), rep = c(1L, 2L, 1L, 2L, 1L, 2L, 
 1L, 2L), vec = list(c(2, 1, 4, 5, 8, 4, 7), c(1, 1, 3, 4, 7, 
 5, 3), c(6, 5, 4, 1, 2, 5, 5), c(4, 4, 4, 0, 1, 4, 3), c(4, 6, 
 8, 7, 7, 5, 8), c(2, 5, 4, 5, 8, 1, 4), c(9, 3, 2, 6, 6, 8, 5
 ), c(7, 1, 2, 3, 2, 7, 3))), .Names = c("subj", "item", "rep", 
 "vec"), row.names = c(NA, -8L), class = "data.frame")

I want this dataframe as the output (with rep1 vs. rep2 correlation and rep1 vs new rep2 correlation).

subj item origCorrel newCorrel
s1 1 .80 .51
s1 2 .93 .34
s2 1 .56 .40
s2 2 .86 .79

解决方案

A typical strategy to get rid of loops is to make all your computations that are on the subsetted data into their own function, then call that function in an aggregate or apply function.

two.cors=function(x,ratio=.5) {
  rep1=unlist(x[1,][['vec']])
  rep2=unlist(x[2,][['vec']])
  orig.cor=cor(rep1,rep2)
     while(mean(rep2) > mean(rep1)*ratio) {
   rep2[    which(rep2==min(rep2[which(!rep2==0)]))]=0
    }
  c(orig.cor,wierd.cor=cor(rep1,rep2))
}

I want to use daply so get plyr, could have used aggregate or an base *apply function

library(plyr)

Then call the function on you dataset

 daply(df,c("subj","item"), .fun=function(x) two.cors(x,ratio=.4) ) 

this output can be reformatted but I left that to you because I think you need additional statistics out of the two.cors function

这篇关于最有效的方法来替换R中数据帧中的最低列表值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆