基于查找df中其他两列之间的最接近值的返回值 [英] Return value based on finding closest value between other two columns in df

查看:134
本文介绍了基于查找df中其他两列之间的最接近值的返回值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题与这一个问题,而不是找到列值和固定数字之间的最接近值,例如"2",我想找到与另一列中的值最接近的值. 这是一个数据示例:

My question is almost identical to this one except instead of finding the closest value between a column value and a fixed number, e.g. "2", I want to find the closest value to the value in another column.. Here's an example of data:

    df <- data.frame(site_no=c("01010500", "01010500", "01010500","02010500", "02010500", "02010500", "03010500", "03010500", "03010500"), 
                     OBS=c(423.9969, 423.9969, 423.9969, 123, 123, 123, 150,150,150),
                     MOD=c(380,400,360,150,155,135,170,180,140),
                     HT=c(14,12,15,3,8,19,12,23,10))

看起来像这样:

   site_no      OBS MOD HT
1 01010500 423.9969 380 14
2 01010500 423.9969 400 12
3 01010500 423.9969 360 15
4 02010500 123.0000 150  3
5 02010500 123.0000 155  8
6 02010500 123.0000 135 19
7 03010500 150.0000 170 12
8 03010500 150.0000 180 23
9 03010500 150.0000 140 10

目标是,对于每个"site_no",找到与OBS值匹配的最接近的MOD值,然后返回相应的HT.例如,对于site_no 01010500,423.9969-400产生最小差,因此该函数将返回12. ,但我认为该功能不是).我尝试过:

The goal is, for every "site_no", find the closest MOD value that matches the OBS value, then return the corresponding HT. For example, for site_no 01010500, 423.9969 - 400 yields the minimum difference, and thus the function would return 12. I have tried most of the solutions from the other post, but get an error due to $ with atomic vector (the df is recursive, but I think the function is not). I tried:

ddply(df, .(site_no), function(z) {
  z[abs(z$OBS - z$MOD) == min(abs(z$OBS - z$MOD)), ]
}) 
Error in z$River_Width..m. - z$chan_width :
  non-numeric argument to binary operator

推荐答案

按'site_no'分组后,我们slice在'OBS'和'MOD'之间具有最小绝对差的行

After grouping by 'site_no', we slice the rows which has the minimum absolute difference between the 'OBS' and 'MOD'

library(dplyr)
res <- df %>%
         group_by(site_no) %>% 
         slice(which.min(abs(OBS-MOD)))

注意:通过使用dplyr,添加了一些其他类,例如tbl_df tibble等,它们应与大多数其他功能一起使用.如果有任何问题,我们可以使用as.data.frame

NOTE: By using dplyr, some additional classes like tbl_df tibble etc. are added which should work with most other functions. If there is any problem, we can convert it to data.frame with as.data.frame

str(res %>%
        as.data.frame)
#'data.frame':   3 obs. of  4 variables:
#$ site_no: Factor w/ 3 levels "01010500","02010500",..: 1 2 3
#$ OBS    : num  424 123 150
#$ MOD    : num  400 135 140
#$ HT     : num  12 19 10

这篇关于基于查找df中其他两列之间的最接近值的返回值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆