使用`rle`函数和`dplyr``group_by`命令来映射分组变量 [英] Using `rle` function along with `dplyr` `group_by` command to mapping grouping variable
问题描述
我有一个包含三列的数据框,其信息类似于下面给出的数据框。现在,我希望基于 a
列中的信息提取信息搜索模式。
I have a dataframe with three columns that has information similar to the data frame given below. Now I wish to extract information search pattern based on the information in column a
.
基于少数开发人员(@thelatemail和@David T)的支持,我能够使用 rle $ c来识别模式$ c>函数,请参见此处-使用rle函数识别模式。现在,我希望继续并将分组信息添加到提取的模式中。我尝试使用
dplyr
do
函数-请参阅下面的代码。但是,这是行不通的。
Based on the support from few developers (@thelatemail and @David T), I was able to identify the pattern with rle
function, please see here - using rle function to identify pattern. Now, I wish to move ahead and add grouping information to the extracted pattern. I tried with dplyr
do
function - refer to the code below. However, this does not work.
示例数据和所需的输出也已提供,供您参考。
The example data and desired output is given as well for your reference.
##mycode that produces error - needs to be fixed
test <- data%>%
group_by(b, c)%>%
do(., data.frame(from = rle(.$a)$values), to = lead(rle(.$a)$values))
##code to create the data frame
a <- c( "a", "b", "b", "b", "a", "c", "a", "b", "d", "d", "d", "e", "f", "f", "e", "e")
b <- c(rep("experiment", times = 8), rep("control", times = 8))
c <- c(rep("A01", times = 4), rep("A02", times = 4), rep("A03", times = 4), rep("A04", times = 4))
data <- data.frame(c,b,a)
## desired output
c b from to fromCount toCount
<chr> <chr> <int> <int>
1 A01 experimental a b 1 3
2 A02 experimental a c 1 1
3 A02 experimental c a 1 1
4 A02 experimental a b 1 1
5 A03 control d e 3 1
6 A04 control f e 2 2
与之前的帖子相比此处,由于我们应用了分组,因此信息被压缩到 a
列。
Compared to the earlier post here, the information gets compressed since we apply grouping to the a
column.
推荐答案
我们可以使用数据中的
rleid
.table
We could use rleid
from data.table
library(data.table)
library(dplyr)
data %>%
group_by(b, c, grp = rleid(a)) %>%
summarise(from = first(a), fromCount = n()) %>%
mutate(to = lead(from), toCount = lead(fromCount)) %>%
ungroup %>%
select(-grp) %>%
filter(!is.na(to)) %>%
arrange(c)
# A tibble: 6 x 6
# b c from fromCount to toCount
# <chr> <chr> <chr> <int> <chr> <int>
#1 experiment A01 a 1 b 3
#2 experiment A02 a 1 c 1
#3 experiment A02 c 1 a 1
#4 experiment A02 a 1 b 1
#5 control A03 d 3 e 1
#6 control A04 f 2 e 2
或使用 rle
,然后按'b','c'和摘要分组
与 rle
创建一个列表
列,然后从该列中提取值和长度在摘要
中,在 from, fromCount的 lead
上创建 to, toCount列 filter
列在 NA
元素和 arrange
列的基础上在 c列上
Or using rle
, after grouping by 'b', 'c', summarise
with rle
to create a list
column, then extract the 'values' and 'lengths' from column in summarise
, create the 'to', 'toCount' on the lead
of the 'from', 'fromCount' column filter
out the NA
elements and arrange
the rows based on the 'c' column
data %>%
group_by(b, c) %>%
summarise(rl = list(rle(a)),
from = rl[[1]]$values,
fromCount = rl[[1]]$lengths) %>%
mutate(to = lead(from),
toCount = lead(fromCount)) %>%
ungroup %>%
select(-rl) %>%
filter(!is.na(to)) %>%
arrange(c)
# A tibble: 6 x 6
# b c from fromCount to toCount
# <chr> <chr> <chr> <int> <chr> <int>
#1 experiment A01 a 1 b 3
#2 experiment A02 a 1 c 1
#3 experiment A02 c 1 a 1
#4 experiment A02 a 1 b 1
#5 control A03 d 3 e 1
#6 control A04 f 2 e 2
我们还可以使用 map遍历
,提取成分,并获取 rle
列表
列('rl')长度
,引线
标记
中的>值,使用 unnest_wider
创建列,并使用嵌套
列表
结构,过滤器
除去NA元素,然后排列
We could also loop over the rle
list
column ('rl') with map
, extract the components, and take the lead
of the lengths
, values
in a tibble
, use unnest_wider
to create the columns and unnest
the list
structure, filter
out the NA elements and arrange
library(tidyr)
library(purrr)
data %>%
group_by(b, c) %>%
summarise(rl = list(rle(a))) %>%
ungroup %>%
mutate(out = map(rl,
~ tibble(from = .x$values,
fromCount = .x$lengths,
to = lead(from),
toCount = lead(fromCount)))) %>%
unnest_wider(c(out)) %>%
unnest(from:toCount) %>%
filter(!is.na(to)) %>%
arrange(c) %>%
select(-rl)
这篇关于使用`rle`函数和`dplyr``group_by`命令来映射分组变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!