生成新的唯一 ID 号,同时排除 R 中先前生成的 ID 号 [英] Generate new unique ID numbers while excluding previously generated ID numbers in R

查看:30
本文介绍了生成新的唯一 ID 号,同时排除 R 中先前生成的 ID 号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想为我的数据库中的行生成唯一 ID.我将不断向该数据库添加条目,因此我需要同时生成新的 ID.虽然我的数据库相对较小,并且复制随机 ID 的可能性很小,但我仍然希望构建一个编程式故障安全机制,以确保我永远不会生成过去已经使用过的 ID.

I would like to generate unique IDs for rows in my database. I will be adding entries to this database on an ongoing basis so I'll need to generate new IDs in tandem. While my database is relatively small and the chance of duplicating random IDs is minuscule, I still want to build in a programmatic fail-safe to ensure that I never generate an ID that has already been used in the past.

对于初学者,这里有一些示例数据,我可以使用它们来启动示例数据库:

For starters, here are some sample data that I can use start an example database:

library(tidyverse)
library(ids)
library(babynames)
    
database <- data.frame(rid = random_id(5, 5), first_name = sample(babynames$name, 5))

print(database)
          rid first_name
1  07282b1da2      Sarit
2  3c2afbb0c3        Aly
3  f1414cd5bf    Maedean
4  9a311a145e    Teriana
5  688557399a    Dreyton

这里有一些示例数据,我可以用来表示将附加到现有数据库的新数据:

And here is some sample data that I can use to represent new data that will be appended to the existing database:

new_data <- sample(babynames$name, 5)

print(new_data)

 first_name
1    Hamzeh
2   Mahmoud
3   Matelyn
4    Camila
5     Renae

现在,我想要的是使用 random_id 函数绑定一列随机生成的新 ID,同时检查以确保新生成的 ID 与 中的任何现有 ID 都不匹配数据库 对象.如果生成器创建了一个相同的 ID,那么理想情况下它会生成一个新的替换,直到创建一个真正唯一的 ID.

Now, what I want is to bind a new column of randomly generated IDs using the random_id function while simultaneously checking to ensure that newly generated IDs don't match any existing IDs within the database object. If the generator created an identical ID, then ideally it would generate a new replacement until a truly unique ID is created.

任何帮助将不胜感激!

更新

我想到了一种有帮助但仍然有限的可能性.我可以生成新的 ID,然后使用 for() 循环来测试现有数据库中是否存在任何新生成的 ID.如果是这样,那么我会重新生成一个新的 ID.例如...

I've thought of a possibility that helps but still is limited. I could generate new IDs and then use a for() loop to test whether any of the newly generated IDs are present in the existing database. If so, then I would regenerate a new ID. For example...

new_data$rid <- random_id(nrow(new_data), 5)

for(i in 1:nrow(new_data)){
  if(new_data$rid[i] %in% unique(database$rid)){
    new_data$rid[id] = random_id(1, 5)
  }
}

这种方法的问题在于,我需要构建无穷无尽的嵌套 if 语句流,以再次针对原始数据库连续测试新生成的值.我需要一个过程来持续测试,直到生成一个在原始数据库中找不到的真正唯一的值.

The problem with this approach is that I would need to build an endless stream of nested if statements to continuously test the newly generated value against the original database again. I need a process to keep testing until a truly unique value that is not found in the original database is generated.

推荐答案

使用 ids::uuid() 可能会排除必须检查重复的 id 值.事实上,如果你要生成 10 万亿个 uuid,那么根据 什么是 UUID?

Use of ids::uuid() would likely preclude having to check for duplicate id values. In fact, if you were to generate 10 trillion uuids, there would be something along the lines of a .00000006 chance of two uuids being the same per What is a UUID?

这是一个无需进行任何迭代即可快速检查重复值的基本函数:

Here is a base function that will quickly check for duplicate values without needing to do any iteration:

anyDuplicated(1:4)
[1] 0

anyDuplicated(c(1:4,1))
[1] 5

上面的第一个结果显示没有重复值.第二个显示元素 5 是重复的,因为 1 被使用了两次.下面是如何在不迭代的情况下进行检查,new_data 复制了 database$rid,因此所有五个都是重复的.这将重复直到所有 rid 都是唯一的,但请注意,它假定所有现有的 database$rid 都是唯一的.

The first result above shows there are no duplicate values. The second is showing that element 5 is a duplicate as 1 is used twice. Below is how to check without iterating, the new_data had the database$rid copied so all five were duplicates. This will repeat until all rid are unique, but note that it presumes that all existing database$rid are unique.

library(ids)
set.seed(7)
new_data$rid <- database$rid
repeat {
  duplicates <- anyDuplicated(c(database$rid, new_data$rid))
  if (duplicates == 0L) {
    break
  }
  new_data$rid[duplicates - nrow(database)] <- random_id(1, 5)
}

所有 new_data$rid 已被替换为唯一值.

All new_data$rid have been replaced with unique values.

rbind(database, new_data)

          rid first_name
1  07282b1da2      Sarit
2  3c2afbb0c3        Aly
3  f1414cd5bf    Maedean
4  9a311a145e    Teriana
5  688557399a    Dreyton
6  52f494c714     Hamzeh
7  ac4f522860    Mahmoud
8  ffe74d535b    Matelyn
9  e3dccc4a8e     Camila
10 e0839a0d34      Renae

这篇关于生成新的唯一 ID 号,同时排除 R 中先前生成的 ID 号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆