比较多行(R)的行值 [英] compare row values over multiple rows (R)

查看:189
本文介绍了比较多行(R)的行值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不认为这个问题已经问了(最类似的问题是提取数据或返回计数)。我是R的新手,所以任何帮助将不胜感激!



我有一个实验的多个运行的数据集在一个文件中,数据看起来像这样,每行运行所有时间步骤
time [info] id(每次运行唯一)



我正在尝试计算系统何时达到均衡,我在3个相互依赖的参数中定义为稳定值。我希望比较行的内容,如果它们超过20个时间步长在5以内,返回稳定性开始的时间步长和id。



到目前为止,我认为它会像以下(或者可能有一个循环)(对于格式不正确):

  y = 1; 
z = 0; #variables来控制循环
x = 0; (ID){
if(CC at time = x == 0.05 + -CC at time = y){

if(z <= 20){#catalogs匹配
y ++
z ++}

其他[在列中保存值]的周期数

}

其他{#no匹配持续时间,所以重新开始
x ++
y = x + 1
z = 0
}
}

eta:CC是我感兴趣的参数之一,范围在0和1之间,尽管端点不太可能。



这是一个简单的例子,可能有帮助:这就像我的数据看起来如何:

  zz< -  textConnection(time CC ID 
1 0.99 1
2 0.80 1
3 0.90 1
4 0.91 1
5 0.92 1
6 0.91 1
1 0.99 2
2 0.90 2
3 0.90 2
4 0.91 2
5 0.92 2
6 0.91 2)
数据< - read.table(zz,header = TRUE)
关闭(zz)
  ID timeToEQ 
1 1 3
2 2 2

是否有此帮助?我可以想到的唯一方式是进行循环,我认为必须是一个更简单的方法!

解决方案

这是我的代码。我会在一段时间内发布解释。

  require(plyr)
ddply(Data,。(ID)总结一下,timeToEQ = Position(isTRUE,abs(diff(CC))< 0.05))

ID timeToEQ
1 1 3
2 2 2

编辑。下面是它的工作原理。


  1. ddply break 数据基于 ID 的子集。

  2. diff(CC)计算连续行的 CC 之间的差异。

  3. abs(diff(CC))< 0.05)如果差异稳定,返回TRUE。

  4. 位置定位元素的第一个实例满足 isTRUE


I don't think this question has asked yet (most similar questions are about extracting data or returning a count). I am new to R, so any help would be appreciated!

I have a dataset of multiple runs of an experiment in one file and the data looks like this, where i have all the time steps for each run in rows time [info] id (unique per run)

I am attempting to calculate when the system reaches equilibrium, which I am defining as stable values in 3 interdependent parameters. I would like to have the contents of rows compared and if they are within 5% of each other over 20 timesteps, to return the timestep at which the stability begins and the id.

So far, I'm thinking it will be something like the following (or maybe have a while loop)(sorry for the bad formatting):

y=1;
z=0; #variables to control the loop
x=0;
for (ID) {
    if (CC at time=x == 0.05+-CC at time=y ) {

       if(z<=20){ #catalogs the number of periods that match
           y++ 
           z++}

      else [save value in column]

   }

else{ #no match for sustained period so start over again
     x++
     y=x+1
     z=0
   }
}

eta: CC is one of my parameters of interest and ranges between 0 and 1 although the endpoints are unlikely.

Here's a simple example that might help: this is something like how my data looks:

zz <- textConnection("time CC ID 
1          0.99       1
2          0.80       1
3          0.90       1
4          0.91       1
5          0.92       1
6          0.91       1
1          0.99       2
2          0.90       2
3          0.90       2
4          0.91       2
5          0.92       2
6          0.91       2")
Data <- read.table(zz, header = TRUE)
close(zz)

my question is, how can i run through the lines to find out when the value of CC becomes 'stable' (meaning it doesn't change by more than 0.05 over X (here, 3) time steps) so that it would create the following results:

    ID  timeToEQ
1   1   3
2   2   2

does this help? The only way I can think to do this is with a for-loop and I think there must be an easier way!

解决方案

Here is my code. I will post the explanation in some time.

require(plyr)
ddply(Data, .(ID), summarize, timeToEQ = Position(isTRUE, abs(diff(CC)) < 0.05 ))

  ID timeToEQ
1  1        3
2  2        2

EDIT. Here is how it works.

  1. ddply breaks Data into subsets based on ID.
  2. diff(CC) computes the difference between CC of successive rows.
  3. abs(diff(CC)) < 0.05) returns TRUE if the difference has stabilized.
  4. Position locates the first instance of an element which satisfies isTRUE.

这篇关于比较多行(R)的行值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆