根据不同数据框中的2个键定义数据框中的列的值 [英] Define the value of a column in a dataframe based on 2 keys from a different dataframe
问题描述
我有以下数据框:
a <- seq(0, 5, by = 0.25)
b <-seq(0, 20, by = 1)
df <- data.frame(a, b)
,我想基于a和b列以及下面的转换表创建一个新的值列:
and I'd like to create a new column "value", based on columns a and b, and the conversion table below:
a_min <- c(0,2, 0,2)
a_max <- c(2,5,2,5)
b_min <- c(0,0,10,10)
b_max <- c(10,10,30,30)
output <-c(1,2,3,4)
conv <- data.frame(a_min, a_max, b_min, b_max, output)
我尝试使用dplyr :: mutate进行操作,但没有太大的成功...
I've tried to do it using dplyr::mutate without much success...
require(dplyr)
mutate(df, value = calcula(conv, a, b))
更长的对象长度不是较短对象长度的倍数
我的期望是获得像上面的'df'这样的数据框,并附加如下所示的列值:
My expectation would be to obtain a dataframe like the 'df' above with the additional column value as per below:
df$value <- c(rep(1,8), rep(2,2), rep(4,11))
推荐答案
这是矩阵的另一种选择。
One more option, this time with matrices.
with(df, with(conv, output[max.col(
outer(a, a_min, `>=`) + outer(a, a_max, `<=`) +
outer(b, b_min, `>=`) + outer(b, b_max, `<=`))]))
## [1] 1 1 1 1 1 1 1 1 1 2 2 4 4 4 4 4 4 4 4 4 4
外部
比较 df
中向量的每个元素,均来自 conv
中向量的每个元素,从而为每个调用生成布尔矩阵。由于 TRUE
为1,如果将所有四个矩阵相加,则所需索引将是 TRUE
最多的列s,您可以通过 max.col
获得。将子集输出
进行分组,就可以得到结果。
outer
compares each element of the vector from df
from the one from conv
, producing a matrix of Booleans for each call. Since TRUE
is 1, if you add all four matrices, the index you want will be the column with the most TRUE
s, which you can get with max.col
. Subset output
, and you've got your result.
使用矩阵的好处是它们 fast 。在1000行上使用@Phann的基准测试:
The benefit of working with matrices is that they're fast. Using @Phann's benchmarks on 1,000 rows:
Unit: microseconds
expr min lq mean median uq max neval cld
alistaire 276.099 320.4565 349.1045 339.8375 357.2705 941.551 100 a
akr1 830.934 966.6705 1064.8433 1057.6610 1152.3565 1507.180 100 ab
akr2 11431.246 11731.3125 12835.5229 11947.5775 12408.4715 36767.488 100 d
Pha 11985.129 12403.1095 13330.1465 12660.4050 13044.9330 29653.842 100 d
Ron 71132.626 74300.3540 81136.9408 78034.2275 88952.8765 98950.061 100 e
Dav1 2506.205 2765.4095 2971.6738 2948.6025 3082.4025 4065.368 100 c
Dav2 2104.481 2272.9180 2480.9570 2478.8775 2575.8740 3683.896 100 bc
并在100,000行上:
and on 100,000 rows:
Unit: milliseconds
expr min lq mean median uq max neval cld
alistaire 30.00677 36.49348 44.28828 39.43293 54.28207 64.36581 100 a
akr1 36.24467 40.04644 48.46986 41.59644 60.15175 77.34415 100 a
Dav1 51.74218 57.23488 67.70289 64.11002 68.86208 382.25182 100 c
Dav2 48.48227 54.82818 60.25256 59.81041 64.92611 91.20212 100 b
这篇关于根据不同数据框中的2个键定义数据框中的列的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!