如何将数据子集存储在列表中? [英] How to subset data.frames stored in a list?

查看:145
本文介绍了如何将数据子集存储在列表中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了一个列表,我在每个组件中存储了一个数据帧。现在我想过滤这些数据帧,只保留在特定列中具有NA的行。我希望这个操作的结果是另一个列表,其中包含数据帧,只有那列在该列中有NA。



这是一些代码来澄清我在说什么。假设 d1 d2 是我的数据框架

  set.seed(1)

d1< -data.frame(a = rnorm(5),b = c(rep(2006,times = 4) )
d2< -data.frame(a = 1:5,b = c(2007,2007,NA,NA,2007))

print(d1)
ab
1.3011543 2006
0.3780023 2006
-0.3101449 2006
-1.3927445 2006
-1.0726218 NA

打印(d2)
ab
1 2007
2 2007
3 NA
4 NA
5 2007

我放在列表中

  ls< -list()

r(i in 1:2){

str< -paste(d,i,sep =)
dat< -get(str)
ls [[str]]< -dat

}

喜欢过滤每个列表组件,以便只留下包含NA的列b的行。为此,我尝试使用以下命令,从一开始就知道它将失败。我的问题是我不知道如果 subset()是正确的使用功能,如果是,我不知道如何限定每个数据帧(即子集功能的第一个元素)

  lsNA< -lapply(ls,subset(ls,is.na (b)))

你能帮我超过我的严格限制吗?

解决方案

lapply 的第二个参数是一个函数( / code>)和子集的额外参数作为 ... 参数传递到 lapply 。因此:

  my.ls<  -  list(d1 = d1,d2 = d2)
my.lsNA< ; - lapply(my.ls,subset,is.na(b))

(我也是向您展示如何轻松创建data.frames列表,而不使用 get ,建议您不要使用 ls 作为一个变量名称,因为它也是一个相当常见的函数的名称。)


I created a list and I stored one data frame in each component. Now I would like to filter those data frames keeping only the rows that have NA in a specific column. I would like the result of this operation to be another list containing data frames with only those rows having NA in that column.

Here is some code to clarify what I am saying. Assume d1 and d2 are my data frames

set.seed(1)

d1<-data.frame(a=rnorm(5), b=c(rep(2006, times=4),NA))
d2<-data.frame(a=1:5, b=c(2007, 2007, NA, NA, 2007))  

print(d1)
 a    b
 1.3011543 2006
 0.3780023 2006
-0.3101449 2006
-1.3927445 2006
-1.0726218   NA

print(d2)
a    b
1 2007
2 2007
3   NA
4   NA
5 2007

which I place in a list

ls<-list()

r (i in 1:2){

  str<-paste("d", i, sep="")
  dat<-get(str)
  ls[[str]]<-dat

}

Now I would like to filter each list component so to leave only rows of column b that contain NA. To do this I tried using the following command, knowing from the beginning it would have failed. My problem is that I don't know if subset() is the right function to use and, in case it is, I don't know how to qualify each data frame (that is, the first element of the subset function)

lsNA<-lapply(ls, subset(ls, is.na(b)))

Can you please help me get past my severe limitations?

解决方案

lapply's second argument is a function (subset) and extra arguments to subset are passed as the ... arguments to lapply. Hence:

my.ls <- list(d1 = d1, d2 = d2)
my.lsNA <- lapply(my.ls, subset, is.na(b))

(I am also showing you how to easily create the list of data.frames without using get, and recommend you don't use ls as a variable name since it is also the name of a rather common function.)

这篇关于如何将数据子集存储在列表中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆