如何根据另一列中的范围对 R 中的列表进行条件子集化 [英] How to conditional subset a list in R based on range in another column

查看:41
本文介绍了如何根据另一列中的范围对 R 中的列表进行条件子集化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已将 100 多个相等的 .xls 文件(每个文件 10 张)导入到 R 中的列表中.我现在正在尝试获取我需要的信息.文件中的数据高度非结构化.

I have imported more than 100 equal .xls files with 10 sheets each into a list in R. I am now trying to get the information out that I need. The data in the files are highly unstructured.

我创建了一些玩具数据来展示我想要的东西.

I have created some toy data to show what I want.

# create my_list

list1 <- list(data.frame(cross = c("NA","NA","o","o","o","x","o","NA","NA"),
                         color = c("NA","NA","grey","black","white","yellow","blue","NA","NA"),
                         temperature = c("NA","NA","3","5","2","7","4","NA","NA")))

list2 <- list(data.frame(cross = c("NA","NA","o","x","o","o","o","NA","NA"),
                         color = c("NA","NA","grey","black","white","yellow","blue","NA","NA"),
                         temperature = c("NA","NA","8","6","1","6","9","NA","NA")))

my_list <- list(list1,list2)

我可以使用 purrr:map 从 my_list 中轻松选择一个值.下面的代码给了我一个向量,例如所有导入文件中最后给定的温度:

I can easily select one value from my_list with purrr:map. The below code gives me a vector of e.g. the last given temperature in all the imported files:

# subset a single value from the list
my_list %>% map_chr(c(1,3,7))
[1] "4" "9"

该向量的长度与我导入的文件数相同.

The vector then has the same lenght as the number of files I have imported.

这里要注意的重要一点是,由于原始 .xls 文件的性质,数据很杂乱,每列中有很多东西.这就是为什么我选择一个单元格从中提取.

The important thing to notice here, is that the data are messy, there are many things in each column, due to the nature of the original .xls files. Thats why I select a single cell to extract from.

我的问题是:如何选择带有x"的颜色?在十字架"中列,基于位置 3 到 7?

My question is: How do I select the color that have a "x" in the "cross" column, based on position 3 to 7?

和以前一样,我需要一个颜色名称的向量,所以输出必须是:

As before, I need a vector of the color names, so output has to be:

黄色"、黑色"(如果我们看上面的玩具数据)和NA"如果根本没有交叉.

"yellow","black" (if we look at the above toy data) and "NA" if no cross at all.

记住,每一列都有很多奇怪的东西,所以我需要在交叉"中指定要查看的范围柱子.在措辞上可能是:

And remember, there are many strange things in each column, so I need to specify the range to look at in the "cross" column. In wording it could be:

"从颜色列中提取颜色的名称,即有x"在它旁边,在交叉列中,位置 3 到 7.总是在颜色名称旁边,我想可以在两列(交叉或颜色)中的任何一列中指定范围.

"extract the name of the color, from the color column, thats has "x" next to it, in cross column, position 3 to 7. Since the "x" is always next to the colorname, I guess the range could be specified in either of the two columns (cross or color).

希望有一个 purrr 解决方案,但谢天谢地,一切都被接受了.

Hope for a purrr solution, but everything is thankfully accepted.

推荐答案

如果我正确理解您的问题,那么应该这样做:

If I understood your question correctly, then this should do it:

map(my_list, function(tbl){
  out_tbl <- tbl[[1]][3:7,] %>%
    dplyr::filter(cross == "x")
  if(nrow(out_tbl) == 0) return(NA)
  as.character(out_tbl$color)
})

这篇关于如何根据另一列中的范围对 R 中的列表进行条件子集化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆