计算小于x的值,并通过多个组找到最接近x的值 [英] Count values less than x and find nearest values to x by multiple groups
问题描述
样本数据框 data
uid bas_id dist2mouth type
2020 2019 W3A9101601 2.413629 1
2021 2020 W3A9101601 2.413629 1
2022 2021 W3A9101602 2.413629 1
2023 2022 W3A9101602 3.313893 1
2032 2031 W3A9101602 3.313893 1
2033 2032 W3A9101602 3.313893 1
2034 2033 W3A9101602 3.313893 1
15023 15022 W3A9101601 1.349000 2
15025 15024 W3A9101601 3.880000 2
15026 15025 W3A9101602 3.880000 2
15027 15026 W3A9101602 0.541101 2
16106 17097 W3A9101602 1.349000 2
对于每行,我想计算在同一 bas_id
中的 type = 2
行中有多少 dist2mouth
较低的行.实际上,每行的下游有多少行 type = 2
.将其存储为 ds_n_type2
.到目前为止,我已经尝试过 dplyr
For each row I'd like to calculate how many rows of type=2
within the same bas_id
have a lower dist2mouth
. Effectively how many rows type=2
are downstream of each row. Store it as ds_n_type2
. So far I've tried dplyr
ds <- data %>%
group_by(id) %>%
summarize(n_ds = sum(dist2mouth > id[dist2mouth]))
然后我想使用相同的 bas_id
在同一 bas_id
中找到与每一行 type = 1
最接近的行 type = 2
在 for
或 apply
循环中>哪个.将其存储为 closest_uid_type2
.也许像
I would then like to find the closest row type=2
to each row type=1
within the same bas_id
maybe using which
in a for
or apply
loop. Store it as closest_uid_type2
. Maybe something like
which(abs(x[i:n]-x[i])==min(abs(x[i:n]-x[i])))
很高兴澄清
修改了2个所需的输出
uid bas_id dist2mouth type ds_n_type2 closest_uid_type2
2020 2019 W3A9101601 2.413629 1 1 15022
2021 2020 W3A9101601 2.413629 1 1 15022
2022 2021 W3A9101602 2.413629 1 2 15022
2023 2022 W3A9101602 3.313893 1 2 15024
2032 2031 W3A9101602 3.313893 1 2 15024
2033 2032 W3A9101602 3.313893 1 2 15024
2034 2033 W3A9101602 3.313893 1 2 15024
15023 15022 W3A9101601 1.349000 2 - -
15025 15024 W3A9101601 3.880000 2 - -
15026 15025 W3A9101602 3.880000 2 - -
15027 15026 W3A9101602 0.541101 2 - -
17097 W3A9101602 1.349000 2 - -
推荐答案
尝试一下:
require(dplyr)
df %>%
group_by(bas_id) %>%
mutate(n_ds = match(dist2mouth,sort(dist2mouth))-1) %>%
mutate(closest_uid=apply(
sapply(dist2mouth,function(i)abs(i-dist2mouth)),
2,function(n) uid[which(n==sort(n)[2])])) %>%
data.frame()
输出:
uid dist2mouth bas_id type n_ds closest_uid
1 1 10 1 1 2 4
2 2 5 1 2 0 3
3 3 6 1 1 1 2
4 4 11 1 1 3 1
5 5 3 2 2 0 6
6 6 4 2 1 1 5
这可能不是最优雅的方法,但这是解决更新后的问题的一种方法(需要花费一些时间来完善它):
This may not be the most elegant, but here's one way to solve the updated question (pending time to refine it):
df$ds_n_type2[df$type==1] <- sapply(as.numeric(row.names(df[df$type==1,])),
function(x) sum(as.numeric(df$dist2mouth[x]) > as.numeric(df$dist2mouth[df$bas_id==df$bas_id[x] & df$type==2])))
df$closest_uid_type2[df$type==1] <- sapply(as.numeric(row.names(df[df$type==1,])),
function(x) df$uid[which(df$dist2mouth==df$dist2mouth[df$bas_id==df$bas_id[x] & df$type==2][which.min(abs(c(df$dist2mouth[df$bas_id==df$bas_id[x] & df$type==2])-df$dist2mouth[x]))])[1]])
输出:
uid bas_id dist2mouth type ds_n_type2 closest_uid_type2
1: 2019 W3A9101601 2.413629 1 1 15022
2: 2020 W3A9101601 2.413629 1 1 15022
3: 2021 W3A9101602 2.413629 1 2 15022
4: 2022 W3A9101602 3.313893 1 2 15024
5: 2031 W3A9101602 3.313893 1 2 15024
6: 2032 W3A9101602 3.313893 1 2 15024
7: 2033 W3A9101602 3.313893 1 2 15024
8: 15022 W3A9101601 1.349000 2 NA NA
9: 15024 W3A9101601 3.880000 2 NA NA
10: 15025 W3A9101602 3.880000 2 NA NA
11: 15026 W3A9101602 0.541101 2 NA NA
12: 17097 W3A9101602 1.349000 2 NA NA
这篇关于计算小于x的值,并通过多个组找到最接近x的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!