在R中重命名重复的字符串 [英] Renaming duplicate strings in R
问题描述
我有一个包含两列字符串的 R 数据框.在其中一列(例如 Column1)中存在重复值.我需要重新标记该列,以便使用有序后缀重命名重复的字符串,例如 Column1.new
I have an R dataframe that has two columns of strings. In one of the columns (say, Column1) there are duplicate values. I need to relabel that column so that it would have the duplicated strings renamed with ordered suffixes, like in the Column1.new
Column1 Column2 Column1.new
1 A 1_1
1 B 1_2
2 C 2_1
2 D 2_2
3 E 3
4 F 4
任何关于如何做到这一点的想法将不胜感激.
Any ideas of how to do this would be appreciated.
干杯,
防盗
推荐答案
假设您的数据(按 Column1
排序)位于名为 tab
的对象中.首先创建一个游程对象
Let's say your data (ordered by Column1
) is within an object called tab
. First create a run length object
c1.rle <- rle(tab$Column1)
c1.rle
##lengths: int [1:4] 2 2 1 1
##values : int [1:4] 1 2 3 4
这为您提供 Column1
的值以及每个元素的相应出现次数.然后使用该信息创建具有唯一标识符的新列:
That gives you values of Column1
and the according number of appearences of each element. Then use that information to create the new column with unique identifiers:
tab$Column1.new <- paste0(rep(c1.rle$values, times = c1.rle$lengths), "_",
unlist(lapply(c1.rle$lengths, seq_len)))
不确定,如果这适合您的情况,但您也可以将 Column1
和 Column2
粘贴在一起,以创建唯一标识符...
Not sure, if this is appropriate in your situation, but you could also just paste together Column1
and Column2
, to create an unique identifier...
这篇关于在R中重命名重复的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!