在截断实际数据框的同时有条件地创建数据框折叠折叠列 [英] Conditionally create a dataframe collapse a collapse a column while truncating actual dataframe
问题描述
我的数据框如下:
+------+----+----------+----------+
| from | to | priority | distance |
+------+----+----------+----------+
| 1 | 3 | 1 | 10 |
| 1 | 5 | 1 | 10 |
| 2 | 7 | 1 | 10 |
| 3 | 9 | 1 | 15 |
| 4 | 8 | 2 | 20 |
| 5 | 6 | 2 | 20 |
| 5 | 1 | 2 | 30 |
| 6 | 2 | 2 | 30 |
| 6 | 4 | 3 | 40 |
| 7 | 2 | 3 | 40 |
| 8 | 3 | 3 | 50 |
| 9 | 5 | 3 | 60 |
| 10 | | 3 | |
| 12 | 11 | 7 | 9 |
+------+----+----------+----------+
按优先级和距离排序
我想根据符合以下条件:
I want to collapse the to column based on the following criterias:
-
至
中的每个唯一值都将分组以及相应的来自
(例如下表中的第1行)
Every unique values in the
to
will be grouped with the correspondingfrom
(example row 1 in the table below)
+--------------+----------+
| from_parent | to_child |
+--------------+----------+
| 1 | 3,5 |
+--------------+----------+
如果值已经错误地分组在 to_child
中(在我们的示例中为数字3),并且该值也出现在从
从主表,如果其对应的到
是新值,则该值从未出现在 from_parent
或 to_child
,则该值应独立出现在from_parent
中,例如我们表中的
If a value has alredy been grouped in the to_child
(in our case the number 3) and if it also appears in the from
of the main table and if its corresponding to
is a new value, a value that hasn't ever appeared in the from_parent
or to_child
, then that value should appear independently in from_parent
For instance from our table,
+------+----+----------+----------+
| from | to | priority | distance |
+------+----+----------+----------+
| 3 | 9 | 1 | 15 |
+------+----+----------+----------+
值9应该独立出现在新表中,如下所示:
the value 9 should appear independetly in the new table as under:
+--------------+----------+
| from_parent | to_child |
+--------------+----------+
| 9 | |
+--------------+----------+
,但是如果值9稍后出现在 to
列中,则应该将其添加到 to_child
群集中,并且应删除先前的值,所以我的意思是,如果9出现为 to $ c从
起
为1的$ c>,对应的 to_child
的值为1应该是 3,5,9
but if the value 9 were to later appear in a to
column it should be added to the to_child
cluster and the previous value should be removed, so what I mean by this is that if 9 appeared as to
for a from
of 1 later on, the the correponsing to_child
value of 1 should be 3,5,9
所以决赛桌应该是
+--------------+----------+
| from_parent | to_child |
+--------------+----------+
| 1 | 3,5 |
| 2 | 7 |
| 4 | 8 |
| 6 | |
| 9 | |
| 10 | |
| 12 | 11 |
+--------------+----------+
推荐答案
我将使用软件包 igraph
来解决问题,因为这是一个图形理论问题。
I will use package igraph
to solve the problem in the question since this is a graph theory problem.
首先,从输入data.frame的前两列构造图形。
First, construct the graph from the first 2 columns of the input data.frame.
library(igraph)
g <- graph_from_data_frame(df1[1:2], directed = TRUE)
plot(g, edge.curved = TRUE, edge.arrow.size = 0.5)
现在从 df1 $ from
中的每个顶点获取路径。路径是通过广度优先搜索获得的,函数 bfs
。
Now get the paths from each of the vertices in df1$from
. The paths are obtained with a breadth-first search, function bfs
.
paths_list <- vector("list", length = length(V(g)))
i <- 0L
for(v in V(g)){
i <- i + 1L
ord <- bfs(g, root = v, neimode = "out",
unreachable = FALSE, dist = TRUE)$dist
ord <- ord[is.finite(ord)]
paths_list[[i]] <- ord
}
from <- lapply(paths_list, function(x) names(x)[1])
to <- lapply(paths_list, function(x) paste(names(x)[x != 0], collapse = ","))
res <- data.frame(from = unlist(from), to = unlist(to), stringsAsFactors = FALSE)
res <- res[nchar(res$from) != 0, ]
res
# from to
#1 1 3,5,6,9,2,4,7,8
#2 2 7
#3 3 9,5,1,6,2,4,7,8
#4 4 8,3,9,5,1,6,2,7
#5 5 1,6,2,3,4,7,8,9
#6 6 2,4,7,8,3,9,5,1
#7 7 2
#8 8 3,9,5,1,6,2,4,7
#9 9 5,1,6,2,3,4,7,8
#10 10
#11 12 11
这篇关于在截断实际数据框的同时有条件地创建数据框折叠折叠列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!