围绕数据帧R dplyr中的值的范围 [英] Ranges surrounding values in data frame R dplyr
问题描述
我有一个数据框,看起来像这样:
test< - data.frame(chunk = c rep(a,27),rep(b,27)),x = c(1,1,1,1,1,1,1,1,1,1,1,0,0 ,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 ,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1))
有一个列,我想使用 group_by()
在 dplyr
,在这个例子中,它被称为 chunk
我想为每个调用 test
调用 x1
,所以得到的数据框如下所示:
test1< - data.frame(test,x1 = c(0,0,0,0,0,0,0,1,1,1,1, 1,2,2,2,2,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1, 1,1,1,2,2,2,2,1,1,1,1,1,0,0,0,0,0,0))
x1
在 x $ c $中识别所有出现的0 c>并从结束0s在每个方向上采取+5行的范围,并添加一个标识符。标识符无关紧要,但在本示例中,
x1
中的标识符在范围为1,在 x中出现0为2
感谢任何和所有的帮助!
这是一个在 dplyr
中执行的选项:
更短版本:
n< - 1:5
测试%>%
group_by(chunk)%>%
mutate(x1 = ifelse((row_number() - min(which(x == 0)))%in -n |
(row_number(chunk) - max(which(x == 0)))%在%n,1,ifelse(x == 0,2,0)))
更长第一个)版本:
test%>%
group_by(chunk)%>%
mutate (start =(row_number() - min(which(x == 0)))%in%-5:-1,
end =(row_number() - max(which(x == 0))) %in%1:5,
x1 = ifelse(start | end,1,ifelse(x == 0,2,0)))%>%
select(-c(start,end ))
资料来源:本地数据框[54 x 3]
组:chunk
chunk x x1
1 a 1 0
2 a 1 0
3 a 1 0
4 a 1 0
5 a 1 0
6 a 1 0
7 a 1 0
8 a 1 1
9 a 1 1
10 a 1 1
11 a 1 1
12 a 1 1
13 a 0 2
14 a 0 2
15 a 0 2
16 a 0 2
17 a 1 1
18 a 1 1
19 a 1 1
20 a 1 1
21 a 1 1
22 a 1 0
23 a 1 0
24 a 1 0
25 a 1 0
26 a 1 0
27 a 1 0
28 b 1 0
29 b 1 0
30 b 1 0
31 b 1 0
32 b 1 0
33 b 1 0
34 b 1 0
35 b 1 1
36 b 1 1
37 b 1 1
38 b 1 1
39 b 1 1
40 b 0 2
41 b 0 2
42 b 0 2
43 b 0 2
44 b 1 1
45 b 1 1
46 b 1 1
47 b 1 1
48 b 1 1
49 b 1 0
50 b 1 0
51 b 1 0
52 b 1 0
53 b 1 0
54 b 1 0
这种方法的假设是,在每个chunk组中只有一个0的序列(如在样本数据中)。如果实际数据不是这样,请告知我。
I have a data frame that looks something like this :
test <- data.frame(chunk = c(rep("a",27),rep("b",27)), x = c(1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1))
There is a column by which I would like to group the data using group_by()
in dplyr
, which in this example is called chunk
I want to add another column to each chunk of test
called x1
so the resulting data frame looks like this :
test1 <- data.frame(test, x1 = c(0,0,0,0,0,0,0,1,1,1,1,1,2,2,2,2,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,2,2,2,2,1,1,1,1,1,0,0,0,0,0,0))
x1
identifies all of the occurrences of 0 in x
and takes a range of +-5 rows in each direction from the end 0s and adds an identifier. What the identifier is doesn't matter, but in this example the identifier in x1
is 1 for the range and 2 for the occurrences of 0 in x
Thanks for any and all help!
Here's an option to do it in dplyr
:
Shorter version:
n <- 1:5
test %>%
group_by(chunk) %>%
mutate(x1 = ifelse((row_number() - min(which(x == 0))) %in% -n |
(row_number(chunk) - max(which(x == 0))) %in% n, 1, ifelse(x == 0, 2, 0)))
Longer (first) version:
test %>%
group_by(chunk) %>%
mutate(start = (row_number() - min(which(x == 0))) %in% -5:-1,
end = (row_number() - max(which(x == 0))) %in% 1:5,
x1 = ifelse(start | end, 1, ifelse(x == 0, 2, 0))) %>%
select(-c(start, end))
Source: local data frame [54 x 3]
Groups: chunk
chunk x x1
1 a 1 0
2 a 1 0
3 a 1 0
4 a 1 0
5 a 1 0
6 a 1 0
7 a 1 0
8 a 1 1
9 a 1 1
10 a 1 1
11 a 1 1
12 a 1 1
13 a 0 2
14 a 0 2
15 a 0 2
16 a 0 2
17 a 1 1
18 a 1 1
19 a 1 1
20 a 1 1
21 a 1 1
22 a 1 0
23 a 1 0
24 a 1 0
25 a 1 0
26 a 1 0
27 a 1 0
28 b 1 0
29 b 1 0
30 b 1 0
31 b 1 0
32 b 1 0
33 b 1 0
34 b 1 0
35 b 1 1
36 b 1 1
37 b 1 1
38 b 1 1
39 b 1 1
40 b 0 2
41 b 0 2
42 b 0 2
43 b 0 2
44 b 1 1
45 b 1 1
46 b 1 1
47 b 1 1
48 b 1 1
49 b 1 0
50 b 1 0
51 b 1 0
52 b 1 0
53 b 1 0
54 b 1 0
The assumption in this approach is, that in each group of "chunk" there is only one sequence of 0s (as in the sample data). Let me know if that's not the case in your actual data.
这篇关于围绕数据帧R dplyr中的值的范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!