对于Dataframe中的每个项目,要自动循环 [英] For each item in Dataframe want to loop automatically

查看:201
本文介绍了对于Dataframe中的每个项目,要自动循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不想重塑它,因为我有很多数据,所以像循环这样的东西会自动翻译它 输入-数据框1

I dont'want reshape it as I am having lot of data so something like a loop whcih automatically translates it Input - Dataframe 1

Item     LC     ToLC
8T4121  MW92    WK14
8T4121  WK14    RM11
8T4121  WK14    RS11
8T4121  RS11    OY01
AB7651  MW92    RS11
AB7651  RS11    OY01

我想做一个循环,我可以得到这样的输出 数据框2

I want to make a loop where I can get a output like this Dataframe 2

Item     LC1    LC2    LC3    LC4
8T4121  MW92    WK14   RM11  
8T4121  MW92    WK14   RS11   OY01
AB7651  MW92    RS11   OY01

我尝试过这样的事情:

bodlane <- lctolc
colnames(bodlane) <- c("Item","Entry","From")

bodlane$To <- lctolc$To[match(bodlane$From, lctolc$From)]
colnames(bodlane) <- c("Item","Entry","Parent","From")

bodlane$To <- lctolc$To[match(bodlane$From, lctolc$From)]
colnames(bodlane) <- c("Item","Entry","Parent","Parent1","From")

bodlane$To <- lctolc$To[match(bodlane$From, lctolc$From)]
colnames(bodlane) <- c("Item","LC","ToLC","Parent1","From","To")

推荐答案

我相信可以使用igraph来解决此问题,方法与中,但不进行计算.

I believe this can be solved with igraph in a similar way as in "recursive" self join in data.table but without the calculation.

此处的困难在于,每个Item都有单独的图形.我的方法是将数据框分成图表列表.可能会有使用type顶点属性的更简洁的解决方案.

The difficulty here is that there are separate graphs for each Item. My approach is to split the data frame into a list of graphs. There might be more concise solutions which use the type vertex attribute.

但是,下面的代码创建了预期的结果:

However, the code below creates the expected result:

library(igraph)
library(data.table)
library(magrittr)

lapply(
  lapply(split(lctolc, lctolc$Item), function(x) graph.data.frame(x[, 2:3])), 
  function(x) lapply(
    V(x)[degree(x, mode = "in") == 0], 
    function(s) all_simple_paths(x, from = s, 
                                 to = V(x)[degree(x, mode = "out") == 0]) %>% 
      lapply(
        function(y) as.data.table(t(names(y))) %>% setnames(paste0("LC", seq_along(.)))
      ) %>% 
      rbindlist(fill = TRUE) 
  ) %>% rbindlist(fill = TRUE)
) %>% rbindlist(fill = TRUE, idcol = "Item")

     Item  LC1  LC2  LC3  LC4
1: 8T4121 MN12 AB12 BC34 <NA>
2: 8T4121 MW92 WK14 RS11 OY01
3: 8T4121 MW92 WK14 RM11 <NA>
4: AB7651 MW92 RS11 OY01 <NA>

说明

对于这样的问题,igraph软件包是一个不错的选择.

Explanation

The igraph package is a good choice for questions like this.

但是,我们需要分别处理每个Item的图形.这是通过拆分data.frame并通过创建一个图形列表来实现的

However, we need to treat the graph of each Item separately. This is achieved by splitting the data.frame and creating a list of graphs by

lg <- lapply(split(lctolc, lctolc$Item), function(x) graph.data.frame(x[, 2:3]))

返回

lg

$`8T4121`
IGRAPH 8eb2bcc DN-- 8 6 -- 
+ attr: name (v/c)
+ edges from 8eb2bcc (vertex names):
[1] AB12->BC34 MN12->AB12 MW92->WK14 WK14->RM11 WK14->RS11 RS11->OY01

$AB7651
IGRAPH 7cd75e7 DN-- 3 2 -- 
+ attr: name (v/c)
+ edges from 7cd75e7 (vertex names):
[1] MW92->RS11 RS11->OY01

或通过两个单独的图可视化.

or, visualised by two separate plots.

lapply(seq_along(lg), function(i) plot(lg[[i]], main = names(lg)[i]))

现在,函数all_simple_paths()列出了从一个源顶点到另一个顶点或多个顶点的简单路径,如果最多访问一次顶点,则该路径为简单路径.要使用该功能,我们需要确定起始节点和所有终止节点.这是通过

Now, the function all_simple_paths() lists simple paths from one source vertex to another vertex or vertices where a path is simple if the vertices are visited once at most. To use the function we need to determine the start nodes and all end nodes. This is achieved by

V(x)[degree(x, mode = "in") == 0]  # start nodes
V(x)[degree(x, mode = "out") == 0] # end nodes 

degree()函数分别返回传入或传出边缘的数量.

The degree() function returns the number of in-coming or out-going edges, resp.

对于我们的示例数据集,我们得到

For our example dataset we get

lapply(lg, function(x) V(x)[degree(x, mode = "in") == 0]) # start nodes

$`8T4121`
+ 2/8 vertices, named, from 8eb2bcc:
[1] MN12 MW92

$AB7651
+ 1/3 vertex, named, from 7cd75e7:
[1] MW92

lapply(lg, function(x) V(x)[degree(x, mode = "out") == 0]) # end nodes

$`8T4121`
+ 3/8 vertices, named, from 8eb2bcc:
[1] BC34 RM11 OY01

$AB7651
+ 1/3 vertex, named, from 7cd75e7:
[1] OY01

现在,我们遍历每个图的所有起始节点并确定所有简单路径.结果再次是一个列表.对于每个列表项,将提取节点名称并以宽格式将其重塑为data.table.这些列被重命名为LC1LC2等.

Now, we loop through all start nodes of each graph and determine all simple paths. The result is a list, again. For each list item, the node names are extracted and reshaped to a data.table in wide format. The columns are renamed to LC1, LC2, etc.

在每个步骤中,我们得到由rbindlist()组合的data.tables列表. fill参数是必需的,因为列数可能会有所不同.对rbindlist()的最终调用使用idcol参数标记与Item关联的行.

In each step, we get a list of data.tables which are combined by rbindlist(). The fill parameter is required as the number of columns may vary. The final call to rbindlist() uses the idcol parameter to mark the rows which are associated with Item.

示例数据集已修改为包含OP注释中的案例这里.

The sample dataset has been amended to include the cases from OP's comments here and here.

library(data.table)
lctolc <- fread("
Item     LC     ToLC
8T4121  AB12    BC34
8T4121  MN12    AB12
8T4121  MW92    WK14
8T4121  WK14    RM11
8T4121  WK14    RS11
8T4121  RS11    OY01
AB7651  MW92    RS11
AB7651  RS11    OY01",
data.table = FALSE)

这篇关于对于Dataframe中的每个项目,要自动循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆