比较两列:逻辑 - 是列1中的值还是列2中的值? [英] Comparing two columns: logical- is value from column 1 also in column 2?

查看:165
本文介绍了比较两列:逻辑 - 是列1中的值还是列2中的值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对如何解决这个问题很困惑。说我在数据帧中有两列。一列是按顺序(x)的数字序列,另一列指定从第一个起的一些值,或-1(y)。这些是匹配实验的结果,其中目标是查看是否拍摄了同一个人的多张照片。在下面的例子中,有10张照片,但6是唯一的个人。在y列中,如果存在匹配,则报告相应的x。对于没有匹配,y是-1(也可以是NA)。如果每个人有超过2张照片,则匹配#将是最近的记录(照片1,5和7是下面的同一个人)。该组是照片拍摄的时间段(组内没有匹配项!)。希望我有这个例子吧:

  x < -  c(1,2,3,4,5,6 ,7,8,9,10)
y< - c(-1,-1,-1,-1,1,-1,1,-1,2,4)
group< ; - c(1,1,1,2,2,2,3,3,3,3)
DF < - data.frame(x,y,group)

我想创建一个新变量来命名唯一的个体,并且有一个每个单独一行的最终数据集(即只有6行,而不是10),也包括组信息。也就是说如果个体在所有三个组中,则可以具有值111,或者如果仅在第一和最后一个组中,则其将是101。有任何提示吗?



感谢您询问结果数据集。我意识到我的组解释是不好的,基于我给的实际数字,所以我改变了结果略。

  name<  -  c(1,2,3,4, 6,8)
group_history< - as.character(c('111','101','100','011','010','001'))
bonus< - as.character(c('1,5,7','2,9','3','4,10','6','8'))
results_I_want < - data。框架(name,group_history,bonus)

我的单词, >

解决方案

使用您提供的(更新)示例

  x <-c(1,2,3,4,5,6,7,8,9,10)
y< - c(-1,-1,-1,-1 ,1,-1,1,-1,3,4)
group <-c(1,1,1,2,2,2,3,3,3,3)

DF < - data.frame(x,y,group)

使用 x y 创建从较高的数字到较低的数字的映射。请注意,名称是一个字符串,尽管它是一个数字字符串。

  bottom.df < -  DF [DF $ y ==  -  1,] 
mapdown.df< - DF [DF $ y!= - 1,]
mapdown< - c(mapdown.df $ y,bottom.df $ x) b $ b names(mapdown)< - c(mapdown.df $ x,bottom.df $ x)

我们不知道将一切都降到最低的次数可能需要多少次,因此必须使用,而循环。

  oldx < -  DF $ x 
newx < - mapdown [as.character(oldx)]
while oldx!= newx)){
oldx = newx
newx = mapdown [as.character(oldx)]
}

结果是它所属的组,以该组的最低编号命名。

  DF $ id<  -  unname(newx)

。使用 reshape2 将其转换为宽格式(每组一列),如果该列为1,如果不存在则为0。



 库(reshape2)

wide< - dcast(DF,id〜group,value。 var =id,
fun.aggregate = function(x){if(length(x)> 0){1} else {0}})

最后,将这些0/1成员资格粘贴在一起以获得您所述的分组变量。

  wide $ grouping = apply(wide [, -  1],1,paste,collapse =)
/ pre>

结果:

 
id 1 2 3分组
1 1 1 1 1 111
2 2 1 0 0 100
3 3 1 0 1 101
4 4 0 1 1 011
5 6 0 1 0 010
6 8 0 0 1 001

否bonus。



编辑:



要获得奖金信息,保持一切。



取代 oldx / newx part with:

  iterx<  -  matrix(DF $ x,ncol = 1) 
iterx < - cbind(iterx,mapdown [as.character(iterx [,1])])
while(any(iterx [,ncol(iterx)]!= iterx [,ncol )-1])){
iterx < - cbind(iterx,mapdown [as.character(iterx [,ncol(iterx)])])
}

$ id< - iterx [,ncol(iterx)]

要生成奖励数据,可以使用

  bonus<  -  tapply(iterx [,1],iterx [,ncol(iterx)],paste,collapse =,)
wide $ bonus< - bonus [as.character(wide $ id)]

其中:

 
id 1 2 3分组奖金
1 1 1 1 1 111 1,5,7
2 2 1 0 0 100 2
3 3 1 0 1 101 3,9
4 4 0 1 1 011 4,10
5 6 0 1 0 010 6
6 8 0 0 1 001 8

注意这和你的示例输出不一样,但我不认为你的示例输出是正确的(你如何有一个 grouping_history of000?)



编辑:



p>

I'm pretty confused on how to go about this. Say I have two columns in a dataframe. One column a numerical series in order (x), the other specifying some value from the first, or -1 (y). These are results from a matching experiment, where the goal is to see if multiple photos are taken of the same individual. In the example below, there 10 photos, but 6 are unique individuals. In the y column, the corresponding x is reported if there is a match. y is -1 for no match (might as well be NAs). If there is more than 2 photos per individual, the match # will be the most recent record (photo 1, 5 and 7 are the same individual below). The group is the time period the photo was take (no matches within a group!). Hopefully I've got this example right:

x <- c(1,2,3,4,5,6,7,8,9,10)
y <- c(-1,-1,-1,-1,1,-1,1,-1,2,4)
group <- c(1,1,1,2,2,2,3,3,3,3)
DF <- data.frame(x,y,group)

I would like to create a new variable to name the unique individuals, and have a final dataset with a single row per individual (i.e. only have 6 rows instead of 10), that also includes the group information. I.e. if an individual is in all three groups, there could be a value of "111" or if just in the first and last group it would be "101". Any tips?

Thanks for asking about the resulting dataset. I realized my group explanation was bad based on the actual numbers I gave, so I changed the results slightly. Bonus would also be nice to have, but not critical.

name <- c(1,2,3,4,6,8)
group_history <- as.character(c('111','101','100','011','010','001'))
bonus <- as.character(c('1,5,7','2,9','3','4,10','6','8')) 
results_I_want <- data.frame(name,group_history,bonus)

My word, more mistakes fixed above...

解决方案

Using the (updated) example you gave

x <- c(1,2,3,4,5,6,7,8,9,10)
y <- c(-1,-1,-1,-1,1,-1,1,-1,3,4)
group <- c(1,1,1,2,2,2,3,3,3,3)

DF <- data.frame(x,y,group)

Use the x and y to create a mapping from higher numbers to lower numbers that are the same person. Note that names is a string, despite it be a string of digits.

bottom.df <- DF[DF$y==-1,]
mapdown.df <- DF[DF$y!=-1,]
mapdown <- c(mapdown.df$y, bottom.df$x)
names(mapdown) <- c(mapdown.df$x, bottom.df$x)

We don't know how many times it might take to get everything down to the lowest number, so have to use a while loop.

oldx <- DF$x
newx <- mapdown[as.character(oldx)]
while(any(oldx != newx)) {
    oldx = newx
    newx = mapdown[as.character(oldx)]
}

The result is the group it belongs to, names by the lowest number of that set.

DF$id <- unname(newx)

Getting the group membership is harder. Using reshape2 to convert this into wide format (one column per group) where the column is "1" if there was something in that one and "0" if not.

library("reshape2")

wide <- dcast(DF, id~group, value.var="id", 
              fun.aggregate=function(x){if(length(x)>0){"1"}else{"0"}})

Finally, paste these "0"/"1" memberships together to get the grouping variable you described.

wide$grouping = apply(wide[,-1], 1, paste, collapse="")

The result:

> wide
  id 1 2 3 grouping
1  1 1 1 1      111
2  2 1 0 0      100
3  3 1 0 1      101
4  4 0 1 1      011
5  6 0 1 0      010
6  8 0 0 1      001

No "bonus" yet.

EDIT:

To get the bonus information, it helps to redo the mapping to keep everything. If you have a lot of cases, this could be slow.

Replace the oldx/newx part with:

iterx <- matrix(DF$x, ncol=1)
iterx <- cbind(iterx, mapdown[as.character(iterx[,1])])
while(any(iterx[,ncol(iterx)]!=iterx[,ncol(iterx)-1])) {
    iterx <- cbind(iterx, mapdown[as.character(iterx[,ncol(iterx)])])
}

DF$id <- iterx[,ncol(iterx)]

To generate the bonus data, then you can use

bonus <- tapply(iterx[,1], iterx[,ncol(iterx)], paste, collapse=",")
wide$bonus <- bonus[as.character(wide$id)]

Which gives:

> wide
  id 1 2 3 grouping bonus
1  1 1 1 1      111 1,5,7
2  2 1 0 0      100     2
3  3 1 0 1      101   3,9
4  4 0 1 1      011  4,10
5  6 0 1 0      010     6
6  8 0 0 1      001     8

Note this isn't same as your example output, but I don't think your example output is right (how can you have a grouping_history of "000"?)

EDIT:

Now it agrees.

这篇关于比较两列:逻辑 - 是列1中的值还是列2中的值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆