r tidyverse spread()使用多个键值对不折叠行 [英] r tidyverse spread() using multiple key value pairs not collapsing rows
问题描述
我正在尝试spread()几个键/值对,但公共值列不会折叠.我认为这可能与先前的处理有关,或更可能是我不知道散布两个或更多键/值对以获得我期望的结果的正确方法.
I am trying to spread() a couple of key/value pairs but the common value column does not collapse. I think that it may have to do with some previous processing, or more likely I do not know the right way to spread two or more key/value pairs to get the result I expect.
我从这个数据集开始:
library(tidyverse)
df <- tibble(order = 1:7,
line_1 = c(23,8,21,45,68,31,24),
line_2 = c(63,25,25,24,48,24,63),
line_3 = c(62,12,10,56,67,25,35))
有2个预扩展步骤,用于定义在以下collect()函数中创建的计数"值的顺序.这是使用行号定义"count"变量的原始顺序的第一个预扩展步骤:
There are 2 pre-spread steps to define order of the "count" values created in the following gather() function. This is the first pre-spread step to define the original order of the "count" variable using the row number:
ntrl <- df %>%
gather(line_1,
line_2,
line_3,
key = "sector",
value = "count") %>%
group_by(order) %>%
mutate(sector_ord = row_number()) %>%
arrange(order,
sector)
这是第二个预扩展步骤,用于定义"count"变量的数字顺序:
This is the second pre-spread step to define the numerical order of the "count" variable:
ord <- ntrl %>%
arrange(order,
count) %>%
group_by(order) %>%
mutate(num_ord = paste0("ord_",
row_number(),
sep=""))
然后是我一直在使用的传播代码:
And then finally the spread code that I have been using:
wide <- ord %>%
group_by(order) %>%
spread(key = sector,
value = count) %>%
spread(key = num_ord,
value = sector_ord)
这是我得到的:
order line_1 line_2 line_3 ord_1 ord_2 ord_3
1 1 23 NA NA 1 NA NA
2 1 NA 63 NA NA NA 2
3 1 NA NA 62 NA 3 NA
4 2 8 NA NA 1 NA NA
5 2 NA 25 NA NA NA 2
6 2 NA NA 12 NA 3 NA
7 3 21 NA NA NA 1 NA
8 3 NA 25 NA NA NA 2
9 3 NA NA 10 3 NA NA
... and so on thru 21 lines accounting for all 7 "order" lines
我期望的行为是"order"列将在所有具有相同"order"值的行中折叠以提供以下内容:
The behavior that I am expecting is that the "order" column would collapse in all rows that are the same "order" value to give the following:
order line_1 line_2 line_3 ord_1 ord_2 ord_3
1 1 23 63 62 1 3 2
2 2 8 25 12 1 3 2
3 3 21 25 10 2 3 1
4 4 45 24 56 2 1 3
... and so on, I think that paints the picture
我已经审查了有关使用重复标识符进行传播以及使用行号索引的问题和答案,但这无济于事.
I have reviewed the questions and answers about spreading with duplicate identifiers and the use of the index of row numbers but that does not help.
我认为这与双重传播有关,但是我不知道该怎么做.
I figure that it has something to do with the double spreading, but I cannot figure out how to do that.
感谢您的帮助.
推荐答案
使用tidyverse
启动df
的解决方案.关键是使用summarise_all(funs(.[which(!is.na(.))]))
为每一列选择唯一的非NA值.
A solution using tidyverse
starting your df
. The key is to use summarise_all(funs(.[which(!is.na(.))]))
to select the only non-NA value for each column.
library(tidyverse)
df2 <- df %>%
gather(Lines, Value, -order) %>%
group_by(order) %>%
mutate(Rank = dense_rank(Value),
RankOrder = paste0("ord_", row_number())) %>%
spread(Lines, Value) %>%
spread(RankOrder, Rank) %>%
summarise_all(funs(.[which(!is.na(.))]))
df2
# A tibble: 7 x 7
order line_1 line_2 line_3 ord_1 ord_2 ord_3
<int> <dbl> <dbl> <dbl> <int> <int> <int>
1 1 23 63 62 1 3 2
2 2 8 25 12 1 3 2
3 3 21 25 10 2 3 1
4 4 45 24 56 2 1 3
5 5 68 48 67 3 1 2
6 6 31 24 25 3 1 2
7 7 24 63 35 1 3 2
这篇关于r tidyverse spread()使用多个键值对不折叠行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!