R枚举具有唯一值的数据框中的重复项 [英] R enumerate duplicates in a dataframe with unique value

查看：35 发布时间：2021/5/3 18:55:43 r duplicates

本文介绍了R枚举具有唯一值的数据框中的重复项的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含一组零件和测试结果的数据框.零件在3个地点(北部中心和南部)进行了测试.有时，这些零件需要重新测试.我想最终创建一些图表，以比较第一次测试零件与第二次(或第三次等)测试结果的结果，例如查看测试仪的可重复性.

I have a dataframe containing a set of parts and test results. The parts are tested on 3 sites (North Centre and South). Sometimes those parts are re-tested. I want to eventually create some charts that compare the results from the first time that a part was tested with the second (or third, etc.) time that it was tested, e.g. to look at tester repeatability.

作为一个例子，我想出了下面的代码.~~我已从morley数据集中明确删除了"Experiment"列，因为这是我正在有效地尝试重新创建的列.解决此问题的方法.有什么想法吗?~~

As an example, I've come up with the below code. ~~I've explicitly removed the "Experiment" column from the morley data set, as this is the column I'm effectively trying to recreate.~~ The code works, however it seems that there must be a more elegant way to approach this problem. Any thoughts?

编辑-我意识到给出的示例对于我的实际需求过于简单(我试图尽可能容易地生成可复制的示例).

Edit - I realise that the example given was overly simplistic for my actual needs (I was trying to generate a reproducible example as easily as possible).

新示例:

part<-as.factor(c("A","A","A","B","B","B","A","A","A","C","C","C")) site<-as.factor(c("N","C","S","C","N","S","N","C","S","N","S","C")) result<-c(17,20,25,51,50,49,43,45,47,52,51,56) data<-data.frame(part,site,result) data$index<-1 repeat { if(!anyDuplicated(data[,c("part","site","index")])) { break } data$index<-ifelse(duplicated(data[,1:2]),data$index+1,data$index) } data part site result index 1 A N 17 1 2 A C 20 1 3 A S 25 1 4 B C 51 1 5 B N 50 1 6 B S 49 1 7 A N 43 2 8 A C 45 2 9 A S 47 2 10 C N 52 1 11 C S 51 1 12 C C 56 1

旧示例:

#Generate a trial data frame from the morley dataset df<-morley[,c(2,3)] #Set up an iterative variable #Create the index column and initialise to 1 df$index<-1 # Loop through the dataframe looking for duplicate pairs of # Runs and Indices and increment the index if it's a duplicate repeat { if(!anyDuplicated(df[,c(1,3)])) { break } df$index<-ifelse(duplicated(df[,c(1,3)]),df$index+1,df$index) } # Check - The below vector should all be true df$index==morley$Expt

推荐答案

我们可以在运行"列上使用 diff 和 cumsum 来获得预期的输出.在这种方法中，我们不会创建1列(即索引")，也不会假设运行"中的序列按OP的示例所示进行排序.

We may use diff and cumsum on the 'Run' column to get the expected output. In this method, we are not creating a column of 1s i.e 'index' and also assuming that the sequence in 'Run' is ordered as showed in the OP's example.

indx <- cumsum(c(TRUE,diff(df$Run)<0)) identical(indx, morley$Expt) #[1] TRUE

或者我们可以使用 ave

indx2 <- with(df, ave(Run, Run, FUN=seq_along)) identical(indx2, morley$Expt) #[1] TRUE

更新
使用新示例

with(data, ave(seq_along(part), part, site, FUN=seq_along)) #[1] 1 1 1 1 1 1 2 2 2 1 1 1

或者我们可以使用 library(splitstackshape)

library(splitstackshape) getanID(data, c('part', 'site'))[]

这篇关于R枚举具有唯一值的数据框中的重复项的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

~~查看全文~~

R枚举具有唯一值的数据框中的重复项 [英] R enumerate duplicates in a dataframe with unique value

问题描述

推荐答案

更新

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R枚举具有唯一值的数据框中的重复项 [英] R enumerate duplicates in a dataframe with unique value

问题描述

推荐答案

更新

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭