纵向比较R ...中的值与扭曲 [英] Comparing values longitudinally in R... with a twist

查看:112
本文介绍了纵向比较R ...中的值与扭曲的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有多个人在多达四个时间段进行测试的结果。下面是一个示例:

  dat < -  structure(list(Participant_ID = c(A,A,A ,A,B,B,
B,B,C,C,C,C ,2L,3L,
4L,1L,2L,3L,4L,1L,2L,3L,4L),.Label = c(base,sixmos,
十八个月),class =factor),result = c(Negative,
Negative,Negative,Negative,Negative,Positive,Negative,
NA,Positive,Indeterminate,Negative,Negative)),.Names = c(Participant_ID,
phase,result),row.names = c ,2L,3L,4L,97L,98L,99L,
100L,9L,10L,11L,12L),class = c(cast_df,data.frame))



其格式如下:

  Participant_ID阶段结果
1 A基本负数
2 A六个负数
3 A十二个负数
4 A十八个负数
97 B基本负数
98 B六个正
99 B十二月负数
100 B十八月< NA>
9 C基数正数
10 C六个不确定
11 C十二个负数
12 C十八个负数

我想为每个测试添加一个标识符,以指出该测试是从前一状态(从负到正),还原(从正到负)还是稳定的转换。抓住的是,我不只是比较基本测试六个月的测试,六个月到十二个月等等 - 在情况下,如C,六十测试应标记为稳定或不确定性(确切的术语是模糊),并且(更重要的是)十二分测试应该与基本测试相比较并且被标记为逆转。相反,如果某人有一个负面,不确定,负面序列,应该是稳定的。



;如果它只是每个参与者的一系列比较,我会没事的,但我有麻烦思考如何优雅地处理这些变量比较对。

解决方案

我不认为你概述了在所有可能的情况下应该发生什么例如当序列是不确定,不确定时的状态是什么?),但是这里是一个想法:将不确定情况视为丢失,并使用来自包动物园的na.locf来推断这些值。

  library(plyr)
at< - at [ with(at,order(Participant_ID,phase)),]
at< - ddply(at,Participant_ID,function(x){
##必须弄清楚如何处理缺少的数据
result.fix< - na.locf(car :: recode(x $ result,'Negative'= 0;'Positive'= 1;'Indeterminate'= NA; NA = 1000))
x $ status< - NA
x $ status [-1]< - result.fix [-1] -result.fix [-length(result.fix)]
x $ status< car :: recode(x $ status,-1 ='reversion'; 1 ='conversion'; 0 ='stable'; else = NA)
x $ status [x $ result ==Indeterminate] < - 稳定或不确定
x
})

虽然有资格优雅!


I have the results of a test taken by a number of individuals at as many as four time periods. Here's a sample:

dat <- structure(list(Participant_ID = c("A", "A", "A", "A", "B", "B", 
"B", "B", "C", "C", "C", "C"), phase = structure(c(1L, 2L, 3L, 
4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c("base", "sixmos", 
"twelvemos", "eighteenmos"), class = "factor"), result = c("Negative", 
"Negative", "Negative", "Negative", "Negative", "Positive", "Negative", 
NA, "Positive", "Indeterminate", "Negative", "Negative")), .Names = c("Participant_ID", 
"phase", "result"), row.names = c(1L, 2L, 3L, 4L, 97L, 98L, 99L, 
100L, 9L, 10L, 11L, 12L), class = c("cast_df", "data.frame"))

which looks like:

    Participant_ID       phase        result
1                A        base      Negative
2                A      sixmos      Negative
3                A   twelvemos      Negative
4                A eighteenmos      Negative
97               B        base      Negative
98               B      sixmos      Positive
99               B   twelvemos      Negative
100              B eighteenmos          <NA>
9                C        base      Positive
10               C      sixmos Indeterminate
11               C   twelvemos      Negative
12               C eighteenmos      Negative

I'd like to add an identifier to each test to note whether that test was a conversion from the previous status (negative to positive), a reversion (positive to negative), or stable. The catch is that I'm not just comparing the base test to the six months test, six months to twelve months, etc. - in cases like C, the sixmos test should be marked as stable or inconclusive (the exact term for that is ambiguous), and (more importantly) the twelvemos test should then be compared to the base test and marked as a reversion. Conversely, if someone had a sequence of "Negative", "Indeterminate", "Negative", that should be stable.

It's the latter part that I'm stuck on; if it were just a sequence of comparisons per participant, I'd be all right, but I'm having trouble thinking about how to elegantly deal with these variable comparison pairs. Your help is, as always, much appreciated.

解决方案

I don't think you outlined what should happen in all possible cases (e.g. what is the status when the sequence is "Indeterminate, Indeterminate"?) but here is an idea: treat the "indeterminate" cases as missing and "impute" them using the na.locf from package zoo to carry forward the values. (Or better, reimplement it to address your case.)

library(plyr)
at <- at[with(at, order(Participant_ID, phase)),]
at <- ddply(at, "Participant_ID", function(x) {
    ## have to figure out what to do with missing data
    result.fix <- na.locf(car::recode(x$result, "'Negative'=0; 'Positive'=1;'Indeterminate'=NA;NA=1000"))
    x$status <- NA
    x$status[-1] <- result.fix[-1]-result.fix[-length(result.fix)]
    x$status <- car::recode(x$status, "-1='reversion'; 1='conversion'; 0='stable'; else=NA")
    x$status[x$result=="Indeterminate"] <- "stable or inconclusive"
    x
})

Not sure this qualifies as elegant, though!

这篇关于纵向比较R ...中的值与扭曲的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆