用另一个df替换na值 [英] Replace na's with value from another df

查看:217
本文介绍了用另一个df替换na值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据框,第一个df具有〜15k个时间和日期所需步骤的记录,第二个df是每间隔时间的平均步数。我想要做的是通过df1,并将na值替换为df2中的avg.steps值,但是我似乎无法弄清楚R.这样做最有效的方法是什么?有没有办法使用dplyr?

I have two dataframes below, the first df has ~15k records of number of steps taken by time and date, the second df is the average number of steps per interval time. What I'm trying to do is go through df1 and replace the na values with the avg.steps value from df2, however I've can't seem to figure it out R. What would be the most efficient way to do this? And is there a way to do it using dplyr?

df1如下所示:

steps <- c(51, 516, NA, NA, 161, 7)
interval <- c(915, 920, 925, 930, 935, 940)

steps  interval
   51       915
  516       920
   NA       925
   NA       930
  161       935
    7       940  

df2如下所示:

avg.steps <- c(51, 516, 245, 0, 161, 7)
interval <- c(915, 920, 925, 930, 935, 940)

avg.steps  interval
       51       915
      516       920
      245       925
        0       930
      161       935
        7       940  


推荐答案

以下是使用 data.table v1.9.6

require(data.table) # v1.9.6+, for 'on=' feature
dt1[is.na(steps), steps := dt2[.SD, avg.steps, on="interval"]]

申辩 i = is.na(steps)允许我们查看那些 dt1 $ steps NA 在这些行上,我们更新 dt1 $ steps 。这是通过执行连接作为子集完成的。 .SD 指数据子集,即 dt1 $ steps 等于 NA

The first argument i = is.na(steps) allows us to look at just those rows where dt1$steps is NA. On those rows, we update dt1$steps. This is done by performing a join as subset. .SD refers to the subset of data, i.e., those rows where dt1$steps equals NA.

对于步骤的每一行 NA ,我们在间隔列中加入时,请在 dt2 中找到相应的匹配行。

For each row where steps is NA, we find the corresponding matching row in dt2 while joining on "interval" column.

例如, is.na(steps)将在 dt1 中返回第3行作为其中一行。找到匹配行 .SD $ interval = 925 dt2 $ interval 将返回索引3(第3行在 dt2 )。相应的 avg.steps 值为245。因此第三行 dt1 将更新为 245

As an example, is.na(steps) would return 3rd row in dt1 as one of the rows. Finding matching row for .SD$interval = 925 with dt2$interval would return the index "3" (3rd row in dt2). The corresponding avg.steps value is "245". Thus 3rd row of dt1 gets updated with 245.

希望这有帮助。

如果 dt2 有多个匹配项任何 dt1 $ interval 值,您必须决定要更新的值。但是我猜这不是这种情况。

If dt2 has multiple matches for any dt1$interval value, you'll have to decide which value to update with. But I'm guessing it is not the case here.

这篇关于用另一个df替换na值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆