通过具有误差范围的测量值连接数据框 [英] Joining Data Frames by Measured Values with an Error Range

查看:34
本文介绍了通过具有误差范围的测量值连接数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种方法来合并(或合并)R中包含指定误差范围内的测量值的两个或多个数据帧.这意味着"by"中的值是"0".列为nnn.nnnn +/- 0.000n.容错性限制为该值的3 e-6倍.

I'm looking for a way to join (or perhaps merge) two or more data frames in R containing measured values with a specified range of error. This means that the value in the "by" column would be nnn.nnnn +/- 0.000n. The error tolerance limited to 3 e-6 times the value.

这是我迄今为止最好的尝试.

This is my best attempt so far.

newDF<-left_join(P0511_480k,P0511_SF00V,by = c(P0511_480k $ mz ==(P0511_SF00V $ mz-0.000003(P0511_480k $ mz))::(P0511_SF00V $ mz + 0.000003(P0511_480k)))

在此表达式中,我有两个数据帧( P0511_480k P0511_SF00V ),我想通过名为"m.z"的列合并它们.可接受的值范围是正或负"m.z".0.000003.例如, P0511_480k_subset $ m.z = 187.06162 应该与 P0511_SF00V_subset $ m.z = 187.06155 匹配.

In this expression, I have two data frames (P0511_480k and P0511_SF00V) and I would like to merge them by a column named "m.z". The acceptable range of values is plus or minus "m.z" times 0.000003. For example, P0511_480k_subset$m.z = 187.06162 should match P0511_SF00V_subset$m.z = 187.06155.

> dput(head(P0511_480k_subset, 10))
structure(list(m.z = c(187.06162, 203.05652, 215.05668, 217.07224, 
279.05499), Intensity = c(319420.8, 288068.9, 229953, 210107.8, 
180054), Relative = c(100, 90.18, 71.99, 65.78, 56.37), Resolution = c(394956.59, 
415308.31, 387924.91, 437318.31, 410670.91), Baseline = c(2.1, 
1.43, 1.69, 1.73, 3.04), Noise = c(28.03, 27.17, 27.52, 27.58, 
29.37)), .Names = c("m.z", "Intensity", "Relative", "Resolution", 
"Baseline", "Noise"), class = c("tbl_df", "data.frame"), row.names = c(NA, 
-5L))

> dput(head(P0511_SF00V_subset, 10))
structure(list(m.z = c(187.06155, 203.05641, 215.05654, 217.0721
), Intensity = c(1021342.8, 801347.1, 662928.1, 523234.2), Relative = c(100, 
78.46, 64.91, 51.23), Resolution = c(314271.88, 298427.41, 289803.97, 
288163.63), Baseline = c(6.89, 10.47, 9.13, 8.89), Noise = c(40.94, 
45.98, 44.3, 44.01)), .Names = c("m.z", "Intensity", "Relative", 
"Resolution", "Baseline", "Noise"), class = c("tbl_df", "data.frame"
), row.names = c(NA, -4L))

感谢您的建议!我已经尽可能广泛地搜索了帮助文档,但找不到合适的示例.

I appreciate your suggestions! I've searched through the help documentation as broadly as possible and I have not been able to find an example that is close to what I need.

非常感谢!

推荐答案

如果不需要不匹配的行,则可以使用.假设两个数据集是df1和df2.浏览df1中的m.z列,如果它在df2的m.z列中任何值的0.000003公差之内,请用df2中的相应匹配值替换df1中的该值.然后合并两个数据帧.

If you don't need the non-matching rows then this can work. Assume the two data sets are df1 and df2. Look through the m.z column in df1 and if it is within the 0.000003 tolerance of any value in m.z column of df2, replace that value in df1 with the corresponding matched value in df2. Then merge the two data frames.

df1$m.z <- sapply(df1$m.z, function(x)
                 {
                  # First check if the element lies within tolerance limits of any element in df2
                  ifelse(min(abs(df2$m.z - x), na.rm=TRUE) < 0.000003 * x,
                  # If yes, replace that element in df1 with the matching element in df2
                   df2[which.min(abs(df2$m.z - x)),"m.z"], 0)
                 })
df3 <- merge(df1, df2)

这篇关于通过具有误差范围的测量值连接数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆