根据所选阈值内的数字行名合并数据帧,并保留不匹配的行 [英] Merge data frames based on numeric rownames within a chosen threshold and keeping unmatched rows as well

查看:65
本文介绍了根据所选阈值内的数字行名合并数据帧,并保留不匹配的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在设置匹配阈值的同时基于数字数据行名称合并两个数据框?

How can I merge two data frames based on their numeric rownames while setting a threshold for the match?

df1 <- structure(list(c(4974622.505928, 170582.149747, 130545.004516, 
143528.819582, 49416.594892, 51879.515558, 52027.462651, 42491.317116, 
49173.145029, 44040.01261), c(4664319.00309, 266278.599338, 204772.412837, 
204819.210688, 77718.961761, 82742.852809, 79706.774944, 67123.603629, 
67264.401059, 66750.260768), c(5906075.502923, 385318.121061, 
296824.944672, 308432.753482, 113407.50333, 120352.400266, 122622.356104, 
98656.179336, 107669.002489, 100262.855064), c(5401712.020682, 
204595.653994, 163485.509823, 179567.339348, 62690.116298, 63790.0244, 
64660.971879, 52545.84055, 59080.66972, 54579.538267), c(5273676.522307, 
159130.126808, 129607.971309, 142279.787439, 45812.561022, 47230.447746, 
48367.405274, 39578.235275, 45489.065198, 43102.923417)), row.names = c("34.9816256", 
"35.0576674", "35.0898006", "35.1270264", "35.1738664", "35.1936282", 
"35.2043582", "35.2359934", "35.2716016", "35.2993064"), class = "data.frame")

df2 <- structure(list(c(5898584.48405, 302326.226264, 185567.968257, 
205617.778019, 84476.66928, 65505.560486, 68121.465276, 63221.947902, 
55028.866127, 36821.607091), c(3719350.766633, 108177.577417, 
68855.378083, 78201.248427, 17558.118703, 23387.078772, 25374.978916, 
18833.579115, 12761.529092, 11507.348928), c(3587498.99736, 96793.741428, 
59750.485295, 70217.309923, 26233.188472, 20200.080468, 22241.999451, 
20268.485836, 17330.391134, 12503.133961), c(3128479.008712, 
70298.795438, 45668.592667, 56013.453832, 20323.368372, 16795.27218, 
16358.208042, 15722.790712, 12276.726458, 9155.522864), c(3847005.494149, 
138762.296854, 94196.099405, 106888.964213, 36614.870588, 30856.787329, 
33880.704043, 31399.328936, 27819.255931, 18560.05768)), row.names = c("34.9815906", 
"35.0356588", "35.0897702", "35.1269978", "35.1535182", "35.1744048", 
"35.1952968", "35.3032464", "35.3207828", "35.3739834"), class = "data.frame")

数据帧的输出(第一行是行名)

Output of the dataframes (first row is rownames)

> df1
34.9816256 4974622.51 4664319.00 5906075.50 5401712.02 5273676.52
35.0576674  170582.15  266278.60  385318.12  204595.65  159130.13
35.0898006  130545.00  204772.41  296824.94  163485.51  129607.97
35.1270264  143528.82  204819.21  308432.75  179567.34  142279.79
35.1738664   49416.59   77718.96  113407.50   62690.12   45812.56
35.1936282   51879.52   82742.85  120352.40   63790.02   47230.45
35.2043582   52027.46   79706.77  122622.36   64660.97   48367.41
35.2359934   42491.32   67123.60   98656.18   52545.84   39578.24
35.2716016   49173.15   67264.40  107669.00   59080.67   45489.07
35.2993064   44040.01   66750.26  100262.86   54579.54   43102.92

> df2
34.9815906 5898584.48 3719350.77 3587499.00 3128479.009 3847005.49
35.0356588  302326.23  108177.58   96793.74   70298.795  138762.30
35.0897702  185567.97   68855.38   59750.49   45668.593   94196.10
35.1269978  205617.78   78201.25   70217.31   56013.454  106888.96
35.1535182   84476.67   17558.12   26233.19   20323.368   36614.87
35.1744048   65505.56   23387.08   20200.08   16795.272   30856.79
35.1952968   68121.47   25374.98   22242.00   16358.208   33880.70
35.3032464   63221.95   18833.58   20268.49   15722.791   31399.33
35.3207828   55028.87   12761.53   17330.39   12276.726   27819.26
35.3739834   36821.61   11507.35   12503.13    9155.523   18560.06

如果行名称中两个数字之间的差在[-0.02,0.02]之间,我想根据它们的行名称合并这两个数据集

I want to merge these two datasets based on their rownames IF the difference between the two numbers in the rownames is between [-0.02, 0.02]

换句话说,应该将df1中的每个行名与df2中的每个行名进行比较,如果发现两个行名之间的差落在[-0.02,0.02]范围内,则可以在同一行上合并数据.如果找不到匹配项,则将NA添加到其他df没有匹配数据的地方(如full_join).

In other words, each rowname from df1 should be compared to each rowname in df2, and if two rownames are found with their difference falling in the range [-0.02, 0.02], then the data can be merged on the same row. If a match is not found, NA will be added to where there was no matching data from the other df (as in full_join).

推荐答案

您可以使用data.table包中的foverlaps

library(data.table)

#add column names to sample data as it's NULL currently
names(df1) <- paste0("df1_", 1:ncol(df1))
names(df2) <- paste0("df2_", 1:ncol(df2))

#convert rownames as first column
setDT(df1, keep.rownames = TRUE)[]
setnames(df1, 1, "df1_rn")
setDT(df2, keep.rownames = TRUE)[]
setnames(df2, 1, "df2_rn")

#add temporary columns to both data tables
df1[, `:=`(df1_rn = as.numeric(df1_rn), temp = as.numeric(df1_rn))]
df2[, `:=`(df2_rn_minus_2 = as.numeric(df2_rn) - 0.02, df2_rn_plus_2 = as.numeric(df2_rn) + 0.02)]
setkey(df2, df2_rn_minus_2, df2_rn_plus_2)

DT = foverlaps(df1, df2, by.x = c("df1_rn", "temp"))[, !c("df2_rn_minus_2", "df2_rn_plus_2", "temp"), with = F]

给出

> DT
        df2_rn      df2_1      df2_2      df2_3      df2_4      df2_5   df1_rn      df1_1      df1_2      df1_3
 1: 34.9815906 5898584.48 3719350.77 3587499.00 3128479.01 3847005.49 34.98163 4974622.51 4664319.00 5906075.50
 2:       <NA>         NA         NA         NA         NA         NA 35.05767  170582.15  266278.60  385318.12
 3: 35.0897702  185567.97   68855.38   59750.49   45668.59   94196.10 35.08980  130545.00  204772.41  296824.94
 4: 35.1269978  205617.78   78201.25   70217.31   56013.45  106888.96 35.12703  143528.82  204819.21  308432.75
 5: 35.1744048   65505.56   23387.08   20200.08   16795.27   30856.79 35.17387   49416.59   77718.96  113407.50
 6: 35.1744048   65505.56   23387.08   20200.08   16795.27   30856.79 35.19363   51879.52   82742.85  120352.40
 7: 35.1952968   68121.47   25374.98   22242.00   16358.21   33880.70 35.19363   51879.52   82742.85  120352.40
 8: 35.1952968   68121.47   25374.98   22242.00   16358.21   33880.70 35.20436   52027.46   79706.77  122622.36
 9:       <NA>         NA         NA         NA         NA         NA 35.23599   42491.32   67123.60   98656.18
10:       <NA>         NA         NA         NA         NA         NA 35.27160   49173.15   67264.40  107669.00
11: 35.3032464   63221.95   18833.58   20268.49   15722.79   31399.33 35.29931   44040.01   66750.26  100262.86
         df1_4      df1_5
 1: 5401712.02 5273676.52
 2:  204595.65  159130.13
 3:  163485.51  129607.97
 4:  179567.34  142279.79
 5:   62690.12   45812.56
 6:   63790.02   47230.45
 7:   63790.02   47230.45
 8:   64660.97   48367.41
 9:   52545.84   39578.24
10:   59080.67   45489.07
11:   54579.54   43102.92

这篇关于根据所选阈值内的数字行名合并数据帧,并保留不匹配的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆