行的唯一值 [英] unique values of rows

查看:123
本文介绍了行的唯一值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我经常遇到看起来像这样的数据:

I often encounter data that looks like this:

#create dummy data frame
data <- as.data.frame(diag(4))
data[data==0] <- NA
data[2,2] <- NA
data

#V1 V2 V3 V4
#1  1 NA NA NA
#2 NA NA NA NA
#3 NA NA  1 NA
#4 NA NA NA  1

行代表参与者,列V1至V4代表参与者所处的条件(例如,V1下的1表示该参与者处于条件1,V4下的1表示该参与者处于条件4).旁注:数据不对称,因此在这四个条件下分布的参与者更多.

Rows represent participants and columns V1 through V4 represent the condition that the participant is in (e.g., a 1 under V1 means this participant is in condition 1, a 1 under V4 means this participant is in condition 4). Sidenote: The data are not symmetric, so there are a lot more participants spread over the 4 conditions.

我想要的是一个向量,每个参与者的条件:

What I want is a vector with the condition for each participant:

1 NA  3  4

我写了以下内容,但想知道是否有更有效的方法(即使用更少的代码行)?

I wrote the following bit, but was wondering if there was a more efficient way (i.e., using fewer lines of code)?

#replace entries with condition numbers 
cond <- data + matrix(rep(0:3, 4), 4, byrow=TRUE) #add 0 to 1 for condition 1...

#get all unique elements (ignore NAs)
cond <- apply(cond, 1, function(x)unique(x[!is.na(x)]))

#because I ignored NAs just now, cond[2,2] is numeric(0)
#assign NA to all values that are numeric(0)
cond[sapply(cond, function(x) length(x)==0)] <- NA

cond <- unlist(cond)
cond
#[1]  1 NA  3  4

推荐答案

我们可以在数据中非NA元素的逻辑矩阵上使用max.colties.method='first'.为了使仅具有NA元素的行成为NA,我们将max.col索引乘以逻辑矩阵的rowSums,并将0个非NA行转换为NA(NA^).

We can use max.col with ties.method='first' on the logical matrix of non-NA elements in 'data'. To make the rows that have only NA elements as NA, we multiply the max.col index with rowSums of logical matrix with 0 non-NA rows converted to NA (NA^).

 max.col(!is.na(data), 'first')* NA^!rowSums(!is.na(data))
 #[1]  1 NA  3  4

或另一个选项是pmax.我们将列索引与数据相乘,以便将非NA元素替换为索引.然后,将pmaxna.rm=TRUE结合使用,并获得每行的最大值.

Or another option is pmax. We multiply the column index with the data so that the non-NA elements get replaced by the index. Then, use pmax with na.rm=TRUE and get the max value per each row.

 do.call(pmax, c(col(data)*data, na.rm=TRUE))
 #[1]  1 NA  3  4

这篇关于行的唯一值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆