用另一个df替换na值 [英] Replace na's with value from another df
问题描述
我有两个数据框,第一个df具有〜15k个时间和日期所需步骤的记录,第二个df是每间隔时间的平均步数。我想要做的是通过df1,并将na值替换为df2中的avg.steps值,但是我似乎无法弄清楚R.这样做最有效的方法是什么?有没有办法使用dplyr?
I have two dataframes below, the first df has ~15k records of number of steps taken by time and date, the second df is the average number of steps per interval time. What I'm trying to do is go through df1 and replace the na values with the avg.steps value from df2, however I've can't seem to figure it out R. What would be the most efficient way to do this? And is there a way to do it using dplyr?
df1如下所示:
steps <- c(51, 516, NA, NA, 161, 7)
interval <- c(915, 920, 925, 930, 935, 940)
steps interval
51 915
516 920
NA 925
NA 930
161 935
7 940
df2如下所示:
avg.steps <- c(51, 516, 245, 0, 161, 7)
interval <- c(915, 920, 925, 930, 935, 940)
avg.steps interval
51 915
516 920
245 925
0 930
161 935
7 940
推荐答案
以下是使用 data.table v1.9.6
:
require(data.table) # v1.9.6+, for 'on=' feature
dt1[is.na(steps), steps := dt2[.SD, avg.steps, on="interval"]]
申辩 i = is.na(steps)
允许我们查看那些 dt1 $ steps
是 NA
。 在这些行上,我们更新 dt1 $ steps
。这是通过执行连接作为子集完成的。 .SD
指数据子集,即 dt1 $ steps
等于 NA
。
The first argument i = is.na(steps)
allows us to look at just those rows where dt1$steps
is NA
. On those rows, we update dt1$steps
. This is done by performing a join as subset. .SD
refers to the subset of data, i.e., those rows where dt1$steps
equals NA
.
对于步骤
的每一行 NA
,我们在间隔列中加入时,请在 dt2
中找到相应的匹配行。
For each row where steps
is NA
, we find the corresponding matching row in dt2
while joining on "interval" column.
例如, is.na(steps)
将在 dt1
中返回第3行作为其中一行。找到匹配行 .SD $ interval = 925
与 dt2 $ interval
将返回索引3(第3行在 dt2
)。相应的 avg.steps
值为245。因此第三行 dt1
将更新为 245
。
As an example, is.na(steps)
would return 3rd row in dt1
as one of the rows. Finding matching row for .SD$interval = 925
with dt2$interval
would return the index "3" (3rd row in dt2
). The corresponding avg.steps
value is "245". Thus 3rd row of dt1
gets updated with 245
.
希望这有帮助。
如果 dt2
有多个匹配项任何 dt1 $ interval
值,您必须决定要更新的值。但是我猜这不是这种情况。
If dt2
has multiple matches for any dt1$interval
value, you'll have to decide which value to update with. But I'm guessing it is not the case here.
这篇关于用另一个df替换na值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!