如何防止元素双向的数据帧查找中的无限循环 [英] How to prevent infinite loop in dataframe lookup where elements are bi-directional

查看:36
本文介绍了如何防止元素双向的数据帧查找中的无限循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

@thothal 向我解释了一个递归函数 here 这允许我基于在父数据帧中查找字符串来递归获取数据帧.使用我提供的示例,效果很好.

I have a recursive function kindly explained to me by @thothal here This allows me to recursively get dataframes based on looking up a character string in the parent dataframe. With the example I provided this works great.

然而,我现在正在处理更多表格,其中子元素存在于父元素中,反之亦然.这导致递归函数中的无限循环.

However I am now working on further tables where the elements in the child are present in the parent and vice versa. This leads to an infinte loop in the recursive function.

要重复原始问题并进行更改:

To repeat the original question with changes:

Numsdf1<-c("C123","C456","C789")
Textdf1<-c("Harry","Bobby","Terry")
df1<-data.frame(Numsdf1,Textdf1,stringsAsFactors=FALSE)

第二个数据帧是查找字符串C123"的结果

The second dataframe is the result of looking up the string "C123"

NumsC123<-c("C123","Noo","Too")
TextC123<-c("Tim","Slim","Shim")
C123<-data.frame(NumsC123,TextC123,stringsAsFactors=FALSE)

第三个数据帧是查找Coo"的结果

The third dataframe is a result of looking up "Coo"

NumsCoo<-c("S144","S199","S743")
TextCoo<-c("Ellie","Bellie","Tellie")
Coo<-data.frame(NumsCoo,TextCoo,stringsAsFactors=FALSE)

第四个是查找Noo"的结果

The fourth is the result of looking up "Noo"

NumsNoo<-c("GHS","THE","PAA")
TextNoo<-c("Front","Bunt","Shunt")
Noo<-data.frame(NumsNoo,TextNoo,stringsAsFactors=FALSE)

最初的解决方案是:

library(tidyverse)
get_all_dfs <- function(df) {
   lapply(df[, 1], function(elem) {
      print(paste("Looking for element", elem))
      # use mget because we can use ifnotfound despite we are requesting only one element
      next_df <- mget(elem, env = .GlobalEnv, ifnotfound = NA)
      if (!is.na(next_df)) {
         unlist(get_all_dfs(next_df[[1]]), F)
      } else {
         list(setNames(df, c("col1", "col2")))
      }
    })
}

flatten_dfr(get_all_dfs(df1)) %>% unique()

这意味着当我运行该函数时,我得到了一个无法跳出的循环.因此,而不是预期的结果:

This means that when I run the function I get a loop which I can't break out of. So instead of the intended result of:

C123 -> Coo -> S144 -> S199 -> S743 -> Noo -> GHS -> THE -> PAA -> Too -> C456 -> C789

我明白

C123 -> Coo -> C123 -> Coo -> C123 etc.

我可以做些什么来防止这种情况发生?

What can I do to prevent this?

我实施了@thothal 的解决方案.我遇到的问题是我使用的查找函数返回一个数据帧而不是一个列表,所以我也创建了一个列表来存储全局环境.但是循环仍然发生.这是更新后的代码:

I implemented the solution from @thothal. The problem I had was that the lookup function I use returns a dataframe rather than a list so I created a list to store the Global environment too. However the loop still occurs. Here is the updated code:

   get_all_dfs_rec <- function(df, my_env) {
        lapply(df$relatedIdEx, function(elem) {
            print(paste("Looking for element", elem))
            next_df <- myGIConcepts(elem) ###This returns a dataframe
            next_df<-list(next_df,my_env) ###Environment variable kept in a list
        if (!is.na(next_df)) {
          rm(list = elem, envir = my_env)
          unlist(get_all_dfs_rec(next_df[[1]], my_env), FALSE)
          } else {
          list(setNames(df, c("col1", "col2")))
        }
    })
  }
        
    get_all_dfs <- function(df_start) {
  ## create a new environment
  my_env <- new.env()
  ## and add all 'data.frames' from the global environment to it
  walk(ls(.GlobalEnv), ~ {
    elem <- get(.x, env = .GlobalEnv);
    if (class(elem) == "data.frame") my_env[[.x]] <- elem})
  flatten_dfr(get_all_dfs_rec(df_start, my_env)) %>% unique()
}

推荐答案

您可以将所有数据框放在自己的环境中,一旦找到就将其从那里删除:

You can put all of your data frames in an own environment and once they are found remove it from there:

get_all_dfs_rec <- function(df, my_env) {
   lapply(df[, 1], function(elem) {
      print(paste("Looking for element", elem))
      # use mget because we can use ifnotfound despite we are requesting only one element
      next_df <- mget(elem, env = my_env, ifnotfound = NA)
      if (!is.na(next_df)) {
         # use list, otherwise rm tries to remove elem (which does not exist in the env)
         rm(list = elem, envir = my_env)
         unlist(get_all_dfs_rec(next_df[[1]], my_env), FALSE)
      } else {
         list(setNames(df, c("col1", "col2")))
      }
    })
}

get_all_dfs <- function(df_start) {
   ## create a new environment
   my_env <- new.env()
   ## and add all 'data.frames' from the global environment to it
   walk(ls(.GlobalEnv), ~ {
       elem <- get(.x, env = .GlobalEnv); 
       if (class(elem) == "data.frame") my_env[[.x]] <- elem})
   flatten_dfr(get_all_dfs_rec(df_start, my_env)) %>% unique()
}

这篇关于如何防止元素双向的数据帧查找中的无限循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆