在R中重命名重复的字符串 [英] Renaming duplicate strings in R

查看:33
本文介绍了在R中重命名重复的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含两列字符串的 R 数据框.在其中一列(例如 Column1)中存在重复值.我需要重新标记该列,以便使用有序后缀重命名重复的字符串,例如 Column1.new

I have an R dataframe that has two columns of strings. In one of the columns (say, Column1) there are duplicate values. I need to relabel that column so that it would have the duplicated strings renamed with ordered suffixes, like in the Column1.new

 Column1   Column2   Column1.new
 1         A         1_1
 1         B         1_2
 2         C         2_1
 2         D         2_2
 3         E         3
 4         F         4

任何关于如何做到这一点的想法将不胜感激.

Any ideas of how to do this would be appreciated.

干杯,

防盗

推荐答案

假设您的数据(按 Column1 排序)位于名为 tab 的对象中.首先创建一个游程对象

Let's say your data (ordered by Column1) is within an object called tab. First create a run length object

c1.rle <- rle(tab$Column1)
c1.rle
##lengths: int [1:4] 2 2 1 1
##values : int [1:4] 1 2 3 4

这为您提供 Column1 的值以及每个元素的相应出现次数.然后使用该信息创建具有唯一标识符的新列:

That gives you values of Column1 and the according number of appearences of each element. Then use that information to create the new column with unique identifiers:

tab$Column1.new <- paste0(rep(c1.rle$values, times = c1.rle$lengths), "_",
        unlist(lapply(c1.rle$lengths, seq_len)))

不确定,如果这适合您的情况,但您也可以将 Column1Column2 粘贴在一起,以创建唯一标识符...

Not sure, if this is appropriate in your situation, but you could also just paste together Column1 and Column2, to create an unique identifier...

这篇关于在R中重命名重复的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆