重复数据框中的行并添加增量字段 [英] Repeat rows in a data frame AND add an increment field

查看：16 发布时间：2022/5/28 17:40:09 r dataframe repeat

本文介绍了重复数据框中的行并添加增量字段的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我找到了很多关于如何复制记录的答案，但我还想为每个复制的记录添加一个增量字段。我发现了一个类似的问题，但它们没有startValue字段：Repeat the rows in a data frame based on values in a specific column。

我的数据框以

开头

df <-
  data startValue freq
    a        3.4    3
    b        2.1    2
    c        6.3    1

我想要此输出

df.expanded <-
    data startValue value
       a        3.4     3
       a        3.4     4
       a        3.4     5
       b        2.1     2
       b        2.1     3
       c        6.3     6

我确实找到了这样做的方法，但我想要一些更简单的方法，可以在大型数据集上很好地工作。以下是我所做的奏效的方法。

df <- data.frame(data = c("a", "b", "c"),
                 startValue = c(3.4, 2.1, 6.3),
                 freq = c(3,2,1))
df

# find the largest integer that I will need as an index.
n <- floor(max(df$startValue + df$freq))-1

# repeat each df record n times. Only the record with the
# largest startValue + freq needs to be repeated this many
# times, but I am repeating everything this many times.
df.expanded <- df[rep(row.names(df), each = n), ]

# Use recycling to fill a new column. Now I have created
# a Cartesian product. If n is 46, records with a
# freq of 46 are repeated just the right number of times.
# but records with a freq of 2 are repeated many more times
# than is needed.
df.expanded$value <- 1:n

# finally, I filter out all the extra repeats that I didn't need.
df.expanded <-
df.expanded[df.expanded$value >= floor(df.expanded$startValue)
            & df.expanded$value < floor(df.expanded$startValue+df.expanded$freq),]
df.expanded[-3]

有没有更好地处理大型数据集的方法？大多数记录需要不到5次重复，但少数需要50次重复。我不喜欢每一条记录都重复50次，而10000条记录中只有1条需要大量重复。谢谢。

重复数据框中的行并添加增量字段 [英] Repeat rows in a data frame AND add an increment field

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

重复数据框中的行并添加增量字段 [英] Repeat rows in a data frame AND add an increment field

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭