通过具有误差范围的测量值连接数据框 [英] Joining Data Frames by Measured Values with an Error Range
问题描述
我正在寻找一种方法来合并(或合并)R中包含指定误差范围内的测量值的两个或多个数据帧.这意味着"by"中的值是"0".列为nnn.nnnn +/- 0.000n.容错性限制为该值的3 e-6倍.
I'm looking for a way to join (or perhaps merge) two or more data frames in R containing measured values with a specified range of error. This means that the value in the "by" column would be nnn.nnnn +/- 0.000n. The error tolerance limited to 3 e-6 times the value.
这是我迄今为止最好的尝试.
This is my best attempt so far.
newDF<-left_join(P0511_480k,P0511_SF00V,by = c(P0511_480k $ mz ==(P0511_SF00V $ mz-0.000003(P0511_480k $ mz))::(P0511_SF00V $ mz + 0.000003(P0511_480k)))
在此表达式中,我有两个数据帧( P0511_480k
和 P0511_SF00V
),我想通过名为"m.z"的列合并它们.可接受的值范围是正或负"m.z".0.000003.例如, P0511_480k_subset $ m.z = 187.06162
应该与 P0511_SF00V_subset $ m.z
= 187.06155
匹配.
In this expression, I have two data frames (P0511_480k
and P0511_SF00V
) and I would like to merge them by a column named "m.z". The acceptable range of values is plus or minus "m.z" times 0.000003. For example, P0511_480k_subset$m.z = 187.06162
should match P0511_SF00V_subset$m.z
= 187.06155
.
> dput(head(P0511_480k_subset, 10))
structure(list(m.z = c(187.06162, 203.05652, 215.05668, 217.07224,
279.05499), Intensity = c(319420.8, 288068.9, 229953, 210107.8,
180054), Relative = c(100, 90.18, 71.99, 65.78, 56.37), Resolution = c(394956.59,
415308.31, 387924.91, 437318.31, 410670.91), Baseline = c(2.1,
1.43, 1.69, 1.73, 3.04), Noise = c(28.03, 27.17, 27.52, 27.58,
29.37)), .Names = c("m.z", "Intensity", "Relative", "Resolution",
"Baseline", "Noise"), class = c("tbl_df", "data.frame"), row.names = c(NA,
-5L))
和
> dput(head(P0511_SF00V_subset, 10))
structure(list(m.z = c(187.06155, 203.05641, 215.05654, 217.0721
), Intensity = c(1021342.8, 801347.1, 662928.1, 523234.2), Relative = c(100,
78.46, 64.91, 51.23), Resolution = c(314271.88, 298427.41, 289803.97,
288163.63), Baseline = c(6.89, 10.47, 9.13, 8.89), Noise = c(40.94,
45.98, 44.3, 44.01)), .Names = c("m.z", "Intensity", "Relative",
"Resolution", "Baseline", "Noise"), class = c("tbl_df", "data.frame"
), row.names = c(NA, -4L))
感谢您的建议!我已经尽可能广泛地搜索了帮助文档,但找不到合适的示例.
I appreciate your suggestions! I've searched through the help documentation as broadly as possible and I have not been able to find an example that is close to what I need.
非常感谢!
推荐答案
如果不需要不匹配的行,则可以使用.假设两个数据集是df1和df2.浏览df1中的m.z列,如果它在df2的m.z列中任何值的0.000003公差之内,请用df2中的相应匹配值替换df1中的该值.然后合并两个数据帧.
If you don't need the non-matching rows then this can work. Assume the two data sets are df1 and df2. Look through the m.z column in df1 and if it is within the 0.000003 tolerance of any value in m.z column of df2, replace that value in df1 with the corresponding matched value in df2. Then merge the two data frames.
df1$m.z <- sapply(df1$m.z, function(x)
{
# First check if the element lies within tolerance limits of any element in df2
ifelse(min(abs(df2$m.z - x), na.rm=TRUE) < 0.000003 * x,
# If yes, replace that element in df1 with the matching element in df2
df2[which.min(abs(df2$m.z - x)),"m.z"], 0)
})
df3 <- merge(df1, df2)
这篇关于通过具有误差范围的测量值连接数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!