围绕数据帧R dplyr中的值的范围 [英] Ranges surrounding values in data frame R dplyr

查看:102
本文介绍了围绕数据帧R dplyr中的值的范围的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,看起来像这样:

  test<  -  data.frame(chunk = c rep(a,27),rep(b,27)),x = c(1,1,1,1,1,1,1,1,1,1,1,0,0 ,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 ,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1))

有一个列,我想使用 group_by() dplyr ,在这个例子中,它被称为 chunk



我想为每个调用 test 调用 x1 ,所以得到的数据框如下所示:

  test1<  -  data.frame(test,x1 = c(0,0,0,0,0,0,0,1,1,1,1, 1,2,2,2,2,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1, 1,1,1,2,2,2,2,1,1,1,1,1,0,0,0,0,0,0))

x1 x 并从结束0s在每个方向上采取+5行的范围,并添加一个标识符。标识符无关紧要,但在本示例中, x1 中的标识符在范围为1,在 x中出现0为2



感谢任何和所有的帮助!

解决方案>

这是一个在 dplyr 中执行的选项:



更短版本:

  n<  -  1:5 
测试%>%
group_by(chunk)%>%
mutate(x1 = ifelse((row_number() - min(which(x == 0)))%in -n |
(row_number(chunk) - max(which(x == 0)))%在%n,1,ifelse(x == 0,2,0)))

更长第一个)版本:

  test%>%
group_by(chunk)%>%
mutate (start =(row_number() - min(which(x == 0)))%in%-5:-1,
end =(row_number() - max(which(x == 0))) %in%1:5,
x1 = ifelse(start | end,1,ifelse(x == 0,2,0)))%>%
select(-c(start,end ))

资料来源:本地数据框[54 x 3]
组:chunk

chunk x x1
1 a 1 0
2 a 1 0
3 a 1 0
4 a 1 0
5 a 1 0
6 a 1 0
7 a 1 0
8 a 1 1
9 a 1 1
10 a 1 1
11 a 1 1
12 a 1 1
13 a 0 2
14 a 0 2
15 a 0 2
16 a 0 2
17 a 1 1
18 a 1 1
19 a 1 1
20 a 1 1
21 a 1 1
22 a 1 0
23 a 1 0
24 a 1 0
25 a 1 0
26 a 1 0
27 a 1 0
28 b 1 0
29 b 1 0
30 b 1 0
31 b 1 0
32 b 1 0
33 b 1 0
34 b 1 0
35 b 1 1
36 b 1 1
37 b 1 1
38 b 1 1
39 b 1 1
40 b 0 2
41 b 0 2
42 b 0 2
43 b 0 2
44 b 1 1
45 b 1 1
46 b 1 1
47 b 1 1
48 b 1 1
49 b 1 0
50 b 1 0
51 b 1 0
52 b 1 0
53 b 1 0
54 b 1 0

这种方法的假设是,在每个chunk组中只有一个0的序列(如在样本数据中)。如果实际数据不是这样,请告知我。


I have a data frame that looks something like this :

test <- data.frame(chunk = c(rep("a",27),rep("b",27)), x = c(1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1))

There is a column by which I would like to group the data using group_by() in dplyr, which in this example is called chunk

I want to add another column to each chunk of test called x1 so the resulting data frame looks like this :

test1 <- data.frame(test, x1 = c(0,0,0,0,0,0,0,1,1,1,1,1,2,2,2,2,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,2,2,2,2,1,1,1,1,1,0,0,0,0,0,0))

x1 identifies all of the occurrences of 0 in x and takes a range of +-5 rows in each direction from the end 0s and adds an identifier. What the identifier is doesn't matter, but in this example the identifier in x1 is 1 for the range and 2 for the occurrences of 0 in x

Thanks for any and all help!

解决方案

Here's an option to do it in dplyr:

Shorter version:

n <- 1:5
test %>%
  group_by(chunk) %>%  
  mutate(x1 = ifelse((row_number() - min(which(x == 0))) %in% -n |
       (row_number(chunk) - max(which(x == 0))) %in% n, 1, ifelse(x == 0, 2, 0))) 

Longer (first) version:

test %>%
  group_by(chunk) %>%
  mutate(start = (row_number() - min(which(x == 0))) %in% -5:-1,
         end = (row_number() - max(which(x == 0))) %in% 1:5,
         x1 = ifelse(start | end, 1, ifelse(x == 0, 2, 0))) %>%
  select(-c(start, end))

Source: local data frame [54 x 3]
Groups: chunk

   chunk x x1
1      a 1  0
2      a 1  0
3      a 1  0
4      a 1  0
5      a 1  0
6      a 1  0
7      a 1  0
8      a 1  1
9      a 1  1
10     a 1  1
11     a 1  1
12     a 1  1
13     a 0  2
14     a 0  2
15     a 0  2
16     a 0  2
17     a 1  1
18     a 1  1
19     a 1  1
20     a 1  1
21     a 1  1
22     a 1  0
23     a 1  0
24     a 1  0
25     a 1  0
26     a 1  0
27     a 1  0
28     b 1  0
29     b 1  0
30     b 1  0
31     b 1  0
32     b 1  0
33     b 1  0
34     b 1  0
35     b 1  1
36     b 1  1
37     b 1  1
38     b 1  1
39     b 1  1
40     b 0  2
41     b 0  2
42     b 0  2
43     b 0  2
44     b 1  1
45     b 1  1
46     b 1  1
47     b 1  1
48     b 1  1
49     b 1  0
50     b 1  0
51     b 1  0
52     b 1  0
53     b 1  0
54     b 1  0

The assumption in this approach is, that in each group of "chunk" there is only one sequence of 0s (as in the sample data). Let me know if that's not the case in your actual data.

这篇关于围绕数据帧R dplyr中的值的范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆