生成序列(并在重复的情况下重新开始)并在 R 中的组内添加每个序列编号最高的新列 [英] generate sequence (and starting over in case of a recurrence) and add new column with highest number per sequence, within group, in R

查看：52 发布时间：2021/7/19 18:45:52 r sequence sqldf

本文介绍了生成序列(并在重复的情况下重新开始)并在 R 中的组内添加每个序列编号最高的新列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在寻找一种方法来为包含按 ID 分组的城市名称的列生成序列.重要的是，当一个城市的名称(在组内)重复时，一个新的序列必须开始.如果有新的 ID，新的序列也应该开始.

I am looking for a way to generate a sequence for a column with names of cities grouped by an ID. What is crucial is that when a name of a city is repeated (within the group) a new sequence has to start. A new sequence should also start in case of a new ID.

如何创建上述序列的问题已经解决.为了稍后选择具有最高序列号的行，我正在寻找一种向数据框中添加新列的方法，该列显示每个记录、每个序列、每个 ID 的每个序列的最高编号.

The question how to create the above mentioned sequence has been solved. To help select the row with the highest sequence number later on, I am looking for a way to add a new column to the data frame that shows for each record, per sequence, per ID the highest number of each sequence.

以下是我想要实现的示例，基于我的数据框的简化版本:

Here is an example of what I want to achieve, based on a simplified version of my data frame:

ID  City    Sequence    Highest_number
1   Nijmegen    1    2
1   Nijmegen    2    2
1   Arnhem      1    2
1   Arnhem      2    2
1   Nijmegen    1    1
1   Arnhem      1    3
1   Arnhem      2    3
1   Arnhem      3    3
1   Nijmegen    1    1
2   Nijmegen    1    1
2   Utrecht     1    1
2   Amsterdam   1    2
2   Amsterdam   2    2
2   Utrecht     1    4
2   Utrecht     2    4
2   Utrecht     3    4
2   Utrecht     4    4 

mydf <- data.frame(ID = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2), 
        City = c("Nijmegen", "Nijmegen", "Arnhem", "Arnhem", "Nijmegen", 
        "Arnhem", "Arnhem","Arnhem", "Nijmegen", "Nijmegen", "Utrecht", 
       "Amsterdam", "Amsterdam", "Utrecht", "Utrecht", "Utrecht", "Utrecht"))

推荐答案

构造一个运行长度编码"并使用它来生成序列

Construct a 'run-length encoding' and use that to generate the sequences

rle <- rle(as.character(mydf$City))
mydf$Sequence <- unlist(lapply(rle$length, seq_len))

对于更新的问题，其中两列构成键，将列与唯一符号粘贴在一起并使用该符号进行计算

For the updated question, where two columns form the key, paste the columns together with a unique symbol and compute with that

rle <- rle(paste(mydf$ID, mydf$City, sep = "\r"))
mydf$Sequence <- unlist(lapply(rle$length, seq_len))

这将是快"的，尤其是与 for 循环相比.

This will be 'fast', especially compared to a for loop.

这篇关于生成序列(并在重复的情况下重新开始)并在 R 中的组内添加每个序列编号最高的新列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

生成序列(并在重复的情况下重新开始)并在 R 中的组内添加每个序列编号最高的新列 [英] generate sequence (and starting over in case of a recurrence) and add new column with highest number per sequence, within group, in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

生成序列(并在重复的情况下重新开始)并在 R 中的组内添加每个序列编号最高的新列 [英] generate sequence (and starting over in case of a recurrence) and add new column with highest number per sequence, within group, in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭