根据其他列替换数据框中的列值 [英] Replace column value in a data frame based on other columns

查看：75 发布时间：2020/10/17 0:25:21 r dataframe

本文介绍了根据其他列替换数据框中的列值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我按名称和时间排序以下数据框。

I have the following data frame ordered by name and time.

set.seed(100)
df <- data.frame('name' = c(rep('x', 6), rep('y', 4)), 
                 'time' = c(rep(1, 2), rep(2, 3), 3, 1, 2, 3, 4),
                 'score' = c(0, sample(1:10, 3), 0, sample(1:10, 2), 0, sample(1:10, 2))
                 )
> df
   name time score
1     x    1     0
2     x    1     4
3     x    2     3
4     x    2     5
5     x    2     0
6     x    3     1
7     y    1     5
8     y    2     0
9     y    3     5
10    y    4     8

在 df $ score 中有零，后跟未知数量的实际值，即 df [1：4，] ，有时两个 df $ score之间有重叠的 df $ name == 0 ，即 df [6：7，] 。

In df$score there are zeros followed by an unknown number of actual values, i.e. df[1:4,], and sometimes there are overlapping df$name between two df$score == 0, i.e. df[6:7,].

我要更改 df $ time ，其中 df $ score！= 0 。具体来说，如果 df $ name df $ score == 0 分配最接近的上一行的时间值>>是匹配的。

I want to change df$time where df$score != 0. Specifically, I want to assign the time value of the closest upper row with df$score == 0 if df$name is matching.

以下代码给出了很好的输出，但是我的数据有数百万行，因此此解决方案效率很低。

The following code gives the good output but my data have millions of rows so this solution is very inefficient.

score_0 <- append(which(df$score == 0), dim(df)[1] + 1)

for(i in 1:(length(score_0) - 1)) {
  df$time[score_0[i]:(score_0[i + 1] - 1)] <-
    ifelse(df$name[score_0[i]:(score_0[i + 1] - 1)] == df$name[score_0[i]], 
           df$time[score_0[i]], 
           df$time[score_0[i]:(score_0[i + 1] - 1)])
 }

> df
   name time score
1     x    1     0
2     x    1     4
3     x    1     3
4     x    1     5
5     x    2     0
6     x    2     1
7     y    1     5
8     y    2     0
9     y    2     5
10    y    2     8

其中分数_0 给出索引，其中 df $ score == 0 。我们看到 df $ time [2：4] 现在都等于1，即 df $ time [6：7] 仅更改了第一个，因为第二个更改为 df $ name =='y'，最接近的上一行更改为 df $ score = = 0 的 df $ name =='x'。最后两行也已正确更改。

Where score_0 gives the index where df$score == 0. We see that df$time[2:4] are now all equal to 1, that in df$time[6:7] only the first one changed because the second have df$name == 'y' and the closest upper row with df$score == 0 has df$name == 'x'. The last two rows also have changed correctly.

根据其他列替换数据框中的列值 [英] Replace column value in a data frame based on other columns

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

根据其他列替换数据框中的列值 [英] Replace column value in a data frame based on other columns

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭