嵌套循环遍历R中的结构化列表 [英] nested loops through a structured list in R

查看:86
本文介绍了嵌套循环遍历R中的结构化列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个示例数据集garden,如下所示.真正的东西是数千行.我也有一个示例列表. productFruit.考虑到garden中报告的usage,我想知道每个fruitcalories.我基本上想遍历表中的所有行,检查用法是否在productFruit列表中为recorded,并返回calories或以下错误消息之一:

I have an example dataset, garden, as shown below. The real thing is thousands of rows. I also have an example list. productFruit. I want to know the calories of every fruit, considering the usage reported in garden. I basically want to loop through all the rows in my table, check if the usage is recorded in the productFruit list and the return either the calories or one of the following error messages:

    如果在productFruit列表中未找到usage,则
  • 超出范围使用"
  • 如果在productFruit列表中未找到usage,则
  • 超出范围的水果"
  • 如果数据丢失,则为错误数据"
  • "usage out of scope" if no usage has been found in the productFruit list
  • "fruit out of scope" if no usage has been found in the productFruit list
  • "erroneous data" if data is missing

garden:

fruit = c("Apple", "Kiwi", "Banana", "Orange", "Blueberry")
usage = c("cooking", "cooking", "NA", "drinking", "medicine")
reported = c(200, 500, 77, 520, 303)

    garden <- cbind(fruit, usage, reported)
    garden <- as.data.table(garden)

productFruit:

productFruit <- list(Basket = c('DUH'), 
                type = list (
                  Apple = list(ID = 1,
                            color = "poor",
                            usage = list(eating = list(ID = 1,
                                                       quality = "good",
                                                       calories = 500),
                                         medicine = list(ID = 2,
                                                         quality = "poor",
                                                         calories = 300))),
                  Orange = list(ID = c(1,2,3),
                            color = c(3,4,5),
                            usage = list(eating = list(ID = 1,
                                                       quality = "poor",
                                                       calories = 420),
                                         cooking = list(ID = 2,
                                                        quality = "questionable",
                                                        calories = 600),
                                         drinking = list(ID = 3,
                                                         quality = "good",
                                                         calories = 800),
                                         medicine = list(ID = 4,
                                                         quality = "good",
                                                         calories = 0))),
                  Banana = list(ID = c(1,2,3),
                           color = c(3,4,5),
                           usage = list(cooking = list(ID = 1,
                                                      quality = "good",
                                                      calories = 49),
                                          drinking = list(ID = 2,
                                                          quality = "questionable",
                                                          calories = 11),
                                          medicine = list(ID = 3,
                                                          quality = "poor",
                                                          calories = 55)))))

我试图将其分解为较小的步骤并使用循环进行此操作,但是我对lists的经验很少,并且遇到了很多错误.如何有效解决此问题的任何想法;可读的方式?下面是我多次尝试匹配fruits的尝试之一.我知道该字段不匹配,我只是想让循环完全运行...

I tried to break it down into smaller steps and doing this with loops, but i have very little experience with lists and was getting many errors. Any ideas how to solve this in an efficient & readable way? Below one of my many attempts to just match the fruits. I am aware that the field do not match, i was just trying to get the loop to run at all...

for (i in seq_len(nrow(garden))){
  if (garden$fruit[i] == productFruit$type){
    garden$calories = productFruit$type[[i]]$ID
  } 
  garden$calories = "error"
}

所需的输出是这样:

    fruit = c("Apple", "Kiwi", "Banana", "Orange", "Blueberry")
    usage = c("cooking", "cooking", "NA", "drinking", "medicine")
    reported = c(200, 500, 77, 520, 303)
    calories = c("usage out of scope", "fruit out of scope", "erroneous data", 800, "fruit out of scope")

garden_with_calories <- cbind(fruit, usage, reported, calories)
garden_with_calories <- as.data.table(garden)

推荐答案

从嵌套列表中提取数据可能非常繁琐.这是一些适用于您提供的示例的代码,但是如果您输入的内容与示例数据有所不同,则可能仍然会遇到困难.您可能必须使其更健壮,并检查数据是否具有您期望的class等.

Extracting data from nested lists can be very tedious. Here is some code that works for the example you provided, but might still struggle, in case you have entries that vary from the example data. You'll probably have to make it more robust and check that the data has the class you expect it to be etc.

library(tidyverse)

步骤1:

我们创建了一些可一次提取一个水果的代码:

Step 1:

We create a some code that extracts one fruit at a time:

# this creates a tibble with a column for each usage entry (eating, drinking,
# etc.)
type_df <- as.tibble(productFruit$type[[1]]$usage)

# With map*() we apply as.tibble() to each column to get a one-row data frame
# per "usage" case. We use map_dfr() in order to bind togeter the resulting
# rows into one dataframe. This is the line that might need to be made more
# robust in order to not fail on unexpected input.)
res <- map_dfr(type_df, as.tibble, .id = "usage")

# When there is no usage entry, `res` will be empty and we create a dummy
# dataframe for that case that has `NA` for the "colories" column.
if (nrow(res) < 1)
  tibble(calories = NA)
else
  res

步骤2:

现在,我们将前几行放入函数中,以便将其应用于所有水果.

Step 2:

Now we put the previous lines into a function, so we can apply it to all fruits.

extract_fruit_data <-
  function(fruit) {
    type_df <- as.tibble(fruit$usage)
    res <- map_dfr(type_df, as.tibble, .id = "usage")
    if (nrow(res) < 1)
      tibble(calories = NA)
    else
      res
  }

步骤3:

我们将extract_fruit_data应用于每个水果的条目,并绑定到 使用map_dfr()生成的行.然后我们删除并重命名一些变量, 为下一步做准备.

Step 3:

We apply extract_fruit_data to each fruit's entry and bind togther the resulting rows using map_dfr(). Then we drop and rename some of the variables, in preparation for the next step.

fruits_df <-
  map_dfr(productFruit$type, extract_fruit_data, .id = "type") %>%
  select(-ID, -quality) %>% 
  rename(fruit = type)

步骤4:

我们将两个数据集与left_join()结合在一起,这样花园中的每个条目, 会保留,并且fruits_df中不匹配的条目将得到NA 在卡路里列中.使用case_when()我们对每列进行分类, 根据您的要求

Step 4:

We join the two datasets with left_join() that way each entry in garden, is kept and those entries that are not matched in fruits_df gets an NA in the calories column. With case_when() we classify each column, according to your specifications

left_join(garden, fruits_df) %>% 
  mutate(calories = case_when(
    usage == "NA" ~ "erroneous data",
    !fruit %in% fruits_df$fruit ~ "fruit out of scope",
    is.na(calories) ~ "usage out of scope",
    TRUE ~ as.character(calories)
  ))

这篇关于嵌套循环遍历R中的结构化列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆