在截断实际数据框的同时有条件地创建数据框折叠折叠列 [英] Conditionally create a dataframe collapse a collapse a column while truncating actual dataframe

查看:95
本文介绍了在截断实际数据框的同时有条件地创建数据框折叠折叠列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据框如下:

+------+----+----------+----------+
| from | to | priority | distance |
+------+----+----------+----------+
|    1 |  3 |        1 |       10 |
|    1 |  5 |        1 |       10 |
|    2 |  7 |        1 |       10 |
|    3 |  9 |        1 |       15 |
|    4 |  8 |        2 |       20 |
|    5 |  6 |        2 |       20 |
|    5 |  1 |        2 |       30 |
|    6 |  2 |        2 |       30 |
|    6 |  4 |        3 |       40 |
|    7 |  2 |        3 |       40 |
|    8 |  3 |        3 |       50 |
|    9 |  5 |        3 |       60 |
|   10 |    |        3 |          |
|   12 | 11 |        7 |        9 |
+------+----+----------+----------+

按优先级和距离排序

我想根据符合以下条件:

I want to collapse the to column based on the following criterias:


  1. 中的每个唯一值都将分组以及相应的来自(例如下表中的第1行)

  1. Every unique values in the to will be grouped with the corresponding from (example row 1 in the table below)

+--------------+----------+
| from_parent  | to_child |
+--------------+----------+
|            1 |      3,5 |
+--------------+----------+

如果值已经错误地分组在 to_child 中(在我们的示例中为数字3),并且该值也出现在从主表,如果其对应的是新值,则该值从未出现在 from_parent to_child ,则该值应独立出现在from_parent
中,例如我们表中的

If a value has alredy been grouped in the to_child (in our case the number 3) and if it also appears in the from of the main table and if its corresponding to is a new value, a value that hasn't ever appeared in the from_parent or to_child, then that value should appear independently in from_parent For instance from our table,

+------+----+----------+----------+
| from | to | priority | distance |
+------+----+----------+----------+
|    3 |  9 |        1 |       15 |
+------+----+----------+----------+

值9应该独立出现在新表中,如下所示:

the value 9 should appear independetly in the new table as under:

+--------------+----------+
| from_parent  | to_child |
+--------------+----------+
|            9 |          |
+--------------+----------+


,但是如果值9稍后出现在 to 列中,则应该将其添加到 to_child 群集中,并且应删除先前的值,所以我的意思是,如果9出现为 to 为1的$ c>,对应的 to_child 的值为1应该是 3,5,9

but if the value 9 were to later appear in a to column it should be added to the to_child cluster and the previous value should be removed, so what I mean by this is that if 9 appeared as to for a from of 1 later on, the the correponsing to_child value of 1 should be 3,5,9

所以决赛桌应该是

+--------------+----------+
| from_parent  | to_child |
+--------------+----------+
|            1 |      3,5 |
|            2 |        7 |
|            4 |        8 |
|            6 |          |
|            9 |          |
|           10 |          |
|           12 |       11 |
+--------------+----------+


推荐答案

我将使用软件包 igraph 来解决问题,因为这是一个图形理论问题。

I will use package igraph to solve the problem in the question since this is a graph theory problem.

首先,从输入data.frame的前两列构造图形。

First, construct the graph from the first 2 columns of the input data.frame.

library(igraph)

g <- graph_from_data_frame(df1[1:2], directed = TRUE)
plot(g, edge.curved = TRUE, edge.arrow.size = 0.5)

现在从 df1 $ from 中的每个顶点获取路径。路径是通过广度优先搜索获得的,函数 bfs

Now get the paths from each of the vertices in df1$from. The paths are obtained with a breadth-first search, function bfs.

paths_list <- vector("list", length = length(V(g)))
i <- 0L
for(v in V(g)){
  i <- i + 1L
  ord <- bfs(g, root = v, neimode = "out",
             unreachable = FALSE, dist = TRUE)$dist
  ord <- ord[is.finite(ord)]
  paths_list[[i]] <- ord
}

from <- lapply(paths_list, function(x) names(x)[1])
to <- lapply(paths_list, function(x) paste(names(x)[x != 0], collapse = ","))
res <- data.frame(from = unlist(from), to = unlist(to), stringsAsFactors = FALSE)
res <- res[nchar(res$from) != 0, ]

res
#   from              to
#1     1 3,5,6,9,2,4,7,8
#2     2               7
#3     3 9,5,1,6,2,4,7,8
#4     4 8,3,9,5,1,6,2,7
#5     5 1,6,2,3,4,7,8,9
#6     6 2,4,7,8,3,9,5,1
#7     7               2
#8     8 3,9,5,1,6,2,4,7
#9     9 5,1,6,2,3,4,7,8
#10   10                
#11   12              11

这篇关于在截断实际数据框的同时有条件地创建数据框折叠折叠列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆