复制data.frame的每一行,并指定每一行的复制次数? [英] Replicate each row of data.frame and specify the number of replications for each row?

查看:251
本文介绍了复制data.frame的每一行,并指定每一行的复制次数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在R中编程,但遇到以下问题:

I am programming in R and I got the following problem:

我有一个数据字符串jb,它很长。以下是它的一个简单版本:

I have a data String jb, that is quite long. Heres a simple version of it:

jb:    a     b     frequency               jb.expanded: a    b   
       5     3        2                                 5    3
       5     7        1                                 5    3
       9     1        40                                5    7
       12    4        5                                 9    1
       12    5        13                                9    1
                                                        ...  ...   

我要复制行,复制频率是列频率。也就是说,第一行被复制两次,第二行被复制一次,依此类推。我已经用代码解决了这个问题

I want to replicate the rows and the frequency of the replication is the column frequency. That means, the first row is replicated two times, the second row is replicated 1 time and so on. I already solved that problem with the code

jb.expanded <- jb[rep(row.names(jb), jb$freqency), 1:2] 

现在这是问题所在:

每当频率拐角处的任何数字大于10时,复制的列数都是错误的。例如:

Whenever any number in the frequency corner is greater than 10, the number of replicated columns is wrong. For example:

Frequency: 43 --> 14 columns
           40 --> 13 columns
           13 --> 11 columns
           14 --> 12 columns

您能帮我吗?我不知道该如何解决,也无法在互联网上找到任何东西。

Can you help me? I have no idea how to fix that, I also cannot find anything on the internet.

感谢您的帮助!

推荐答案

更新



重新讨论这个问题后,我觉得@Codoremifa在他们假设您的频率列可能是 的因素。

如果是这种情况,这里有一个例子。由于我不知道您的数据集中还有哪些其他级别,因此它与您的实际数据不匹配。

Here's an example if that were the case. It won't match your actual data since I don't know what other levels are in your dataset.

mydf$F2 <- factor(as.character(mydf$frequency))
## expandRows(mydf, "F2")
mydf[rep(rownames(mydf), mydf$F2), ]
#      a b frequency F2
# 1    5 3         2  2
# 1.1  5 3         2  2
# 1.2  5 3         2  2
# 2    5 7         1  1
# 3    9 1        40 40
# 3.1  9 1        40 40
# 3.2  9 1        40 40
# 3.3  9 1        40 40
# 4   12 4         5  5
# 4.1 12 4         5  5
# 4.2 12 4         5  5
# 4.3 12 4         5  5
# 4.4 12 4         5  5
# 5   12 5        13 13
# 5.1 12 5        13 13

嗯。对我来说,这看起来不像61行。为什么不?因为 rep 使用 factor 底层的数字值,在这种情况下,它与显示的值完全不同:

Hmmm. That doesn't look like 61 rows to me. Why not? Because rep uses the numeric values underlying the factor, which is quite different in this case from the displayed value:

as.numeric(mydf$F2)
# [1] 3 1 4 5 2

要正确转换,您需要:

as.numeric(as.character(mydf$F2))
# [1]  2  1 40  5 13






原始答案



前一阵子,我写了一个函数@ Simono101的答案的概括。函数看起来像这样:


Original answer

A while ago I wrote a function that is a bit more of a generalization of @Simono101's answer. The function looks like this:

expandRows <- function(dataset, count, count.is.col = TRUE) {
  if (!isTRUE(count.is.col)) {
    if (length(count) == 1) {
      dataset[rep(rownames(dataset), each = count), ]
    } else {
      if (length(count) != nrow(dataset)) {
        stop("Expand vector does not match number of rows in data.frame")
      }
      dataset[rep(rownames(dataset), count), ]
    }
  } else {
    dataset[rep(rownames(dataset), dataset[[count]]), 
            setdiff(names(dataset), names(dataset[count]))]
  }
}






出于您的目的,您可以只使用 expandRows(mydf, frequency)

head(expandRows(mydf, "frequency"))
#     a b
# 1   5 3
# 1.1 5 3
# 2   5 7
# 3   9 1
# 3.1 9 1
# 3.2 9 1   

其他选项将重复每个选项行相同的次数:

Other options are to repeat each row the same number of times:

expandRows(mydf, 2, count.is.col=FALSE)
#      a b frequency
# 1    5 3         2
# 1.1  5 3         2
# 2    5 7         1
# 2.1  5 7         1
# 3    9 1        40
# 3.1  9 1        40
# 4   12 4         5
# 4.1 12 4         5
# 5   12 5        13
# 5.1 12 5        13

或指定重复每行多少次的向量。

Or to specify a vector of how many times to repeat each row.

expandRows(mydf, c(1, 2, 1, 0, 2), count.is.col=FALSE)
#      a b frequency
# 1    5 3         2
# 2    5 7         1
# 2.1  5 7         1
# 3    9 1        40
# 5   12 5        13
# 5.1 12 5        13

请注意以下内容中必需的 count.is.col = FALSE 参数最后两个选项。

Note the required count.is.col = FALSE argument in those last two options.

这篇关于复制data.frame的每一行,并指定每一行的复制次数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆