对于Dataframe中的每个项目,要自动循环 [英] For each item in Dataframe want to loop automatically
问题描述
我不想重塑它,因为我有很多数据,所以像循环这样的东西会自动翻译它 输入-数据框1
I dont'want reshape it as I am having lot of data so something like a loop whcih automatically translates it Input - Dataframe 1
Item LC ToLC
8T4121 MW92 WK14
8T4121 WK14 RM11
8T4121 WK14 RS11
8T4121 RS11 OY01
AB7651 MW92 RS11
AB7651 RS11 OY01
我想做一个循环,我可以得到这样的输出 数据框2
I want to make a loop where I can get a output like this Dataframe 2
Item LC1 LC2 LC3 LC4
8T4121 MW92 WK14 RM11
8T4121 MW92 WK14 RS11 OY01
AB7651 MW92 RS11 OY01
我尝试过这样的事情:
bodlane <- lctolc
colnames(bodlane) <- c("Item","Entry","From")
bodlane$To <- lctolc$To[match(bodlane$From, lctolc$From)]
colnames(bodlane) <- c("Item","Entry","Parent","From")
bodlane$To <- lctolc$To[match(bodlane$From, lctolc$From)]
colnames(bodlane) <- c("Item","Entry","Parent","Parent1","From")
bodlane$To <- lctolc$To[match(bodlane$From, lctolc$From)]
colnames(bodlane) <- c("Item","LC","ToLC","Parent1","From","To")
推荐答案
我相信可以使用igraph
来解决此问题,方法与中,但不进行计算.
I believe this can be solved with igraph
in a similar way as in "recursive" self join in data.table but without the calculation.
此处的困难在于,每个Item
都有单独的图形.我的方法是将数据框分成图表列表.可能会有使用type
顶点属性的更简洁的解决方案.
The difficulty here is that there are separate graphs for each Item
. My approach is to split the data frame into a list of graphs. There might be more concise solutions which use the type
vertex attribute.
但是,下面的代码创建了预期的结果:
However, the code below creates the expected result:
library(igraph)
library(data.table)
library(magrittr)
lapply(
lapply(split(lctolc, lctolc$Item), function(x) graph.data.frame(x[, 2:3])),
function(x) lapply(
V(x)[degree(x, mode = "in") == 0],
function(s) all_simple_paths(x, from = s,
to = V(x)[degree(x, mode = "out") == 0]) %>%
lapply(
function(y) as.data.table(t(names(y))) %>% setnames(paste0("LC", seq_along(.)))
) %>%
rbindlist(fill = TRUE)
) %>% rbindlist(fill = TRUE)
) %>% rbindlist(fill = TRUE, idcol = "Item")
Item LC1 LC2 LC3 LC4
1: 8T4121 MN12 AB12 BC34 <NA>
2: 8T4121 MW92 WK14 RS11 OY01
3: 8T4121 MW92 WK14 RM11 <NA>
4: AB7651 MW92 RS11 OY01 <NA>
说明
对于这样的问题,igraph
软件包是一个不错的选择.
Explanation
The igraph
package is a good choice for questions like this.
但是,我们需要分别处理每个Item
的图形.这是通过拆分data.frame并通过创建一个图形列表来实现的
However, we need to treat the graph of each Item
separately. This is achieved by splitting the data.frame and creating a list of graphs by
lg <- lapply(split(lctolc, lctolc$Item), function(x) graph.data.frame(x[, 2:3]))
返回
lg
$`8T4121`
IGRAPH 8eb2bcc DN-- 8 6 --
+ attr: name (v/c)
+ edges from 8eb2bcc (vertex names):
[1] AB12->BC34 MN12->AB12 MW92->WK14 WK14->RM11 WK14->RS11 RS11->OY01
$AB7651
IGRAPH 7cd75e7 DN-- 3 2 --
+ attr: name (v/c)
+ edges from 7cd75e7 (vertex names):
[1] MW92->RS11 RS11->OY01
或通过两个单独的图可视化.
or, visualised by two separate plots.
lapply(seq_along(lg), function(i) plot(lg[[i]], main = names(lg)[i]))
现在,函数all_simple_paths()
列出了从一个源顶点到另一个顶点或多个顶点的简单路径,如果最多访问一次顶点,则该路径为简单路径.要使用该功能,我们需要确定起始节点和所有终止节点.这是通过
Now, the function all_simple_paths()
lists simple paths from one source vertex to another vertex or vertices where a path is simple if the vertices are visited once at most. To use the function we need to determine the start nodes and all end nodes. This is achieved by
V(x)[degree(x, mode = "in") == 0] # start nodes
V(x)[degree(x, mode = "out") == 0] # end nodes
degree()
函数分别返回传入或传出边缘的数量.
The degree()
function returns the number of in-coming or out-going edges, resp.
对于我们的示例数据集,我们得到
For our example dataset we get
lapply(lg, function(x) V(x)[degree(x, mode = "in") == 0]) # start nodes
$`8T4121`
+ 2/8 vertices, named, from 8eb2bcc:
[1] MN12 MW92
$AB7651
+ 1/3 vertex, named, from 7cd75e7:
[1] MW92
lapply(lg, function(x) V(x)[degree(x, mode = "out") == 0]) # end nodes
$`8T4121`
+ 3/8 vertices, named, from 8eb2bcc:
[1] BC34 RM11 OY01
$AB7651
+ 1/3 vertex, named, from 7cd75e7:
[1] OY01
现在,我们遍历每个图的所有起始节点并确定所有简单路径.结果再次是一个列表.对于每个列表项,将提取节点名称并以宽格式将其重塑为data.table.这些列被重命名为LC1
,LC2
等.
Now, we loop through all start nodes of each graph and determine all simple paths. The result is a list, again. For each list item, the node names are extracted and reshaped to a data.table in wide format. The columns are renamed to LC1
, LC2
, etc.
在每个步骤中,我们得到由rbindlist()
组合的data.tables列表. fill
参数是必需的,因为列数可能会有所不同.对rbindlist()的最终调用使用idcol
参数标记与Item
关联的行.
In each step, we get a list of data.tables which are combined by rbindlist()
. The fill
parameter is required as the number of columns may vary. The final call to rbindlist() uses the idcol
parameter to mark the rows which are associated with Item
.
示例数据集已修改为包含OP注释中的案例这里.
The sample dataset has been amended to include the cases from OP's comments here and here.
library(data.table)
lctolc <- fread("
Item LC ToLC
8T4121 AB12 BC34
8T4121 MN12 AB12
8T4121 MW92 WK14
8T4121 WK14 RM11
8T4121 WK14 RS11
8T4121 RS11 OY01
AB7651 MW92 RS11
AB7651 RS11 OY01",
data.table = FALSE)
这篇关于对于Dataframe中的每个项目,要自动循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!