根据多列值获取数据子集 [英] Getting subset of of data based on multiple column values

查看:25
本文介绍了根据多列值获取数据子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图根据第 2 列和第 3 列是否包含 0 来删除行.我不断得到非常奇怪的结果.我最初尝试在没有 subset 的情况下编写它,因为我在某处读到 subset 应该只用于少量数据,因为内存成本.然而,这两种尝试都不适合我.有人可以解释我做错了什么吗?

I am trying to remove rows based on whether or not columns 2 and 3 contain 0's. I keep getting very strange results. I tried to write it without subset initially because I read somewhere that subset should only be used for small amounts of data because of the memory cost. Neither attempt worked for me however. Can someone explain what I did wrong?

df <- data.frame(val1=c(1,2,3), val2=c(4,0,5), val3=c(3,0,6))
subset(df,df>0,c(2,3))
data.frame(df[df[,c(2,3)]!=0])

起始数据帧:

   val1   val2   val3
1  1       4       3
1  2       0       0
3  3       5       6

最终目标:

   val1   val2   val3
1  1       4       3
3  3       5       6

推荐答案

使用 subset,我们创建了基于第二和第三列的逻辑索引.

Using the subset, we create a logical index based on the 2nd and third columns.

subset(df, subset=!(val2==0|val3==0))

as subset 参数适用于列而不是矩阵.我们也可以使用 [ 而不是 subset.

as subset argument works on columns and not on matrices. We can also use [ instead of subset.

df[!(df[,2]==0|df[,3]==0),]

关于 OP 帖子中的第二个答案

Regarding the second answer in the OP's post

df[,c(2,3)]!=0 #returns a matrix
#      val2  val3
#[1,]  TRUE  TRUE
#[2,] FALSE FALSE
#[3,]  TRUE  TRUE

对于行子集,我们只需要每行一个逻辑索引.

For subsetting rows, we need only a single logical index per each row.

另一个选项是 rowSums(如果您想删除第 2 列和第 3 列均为 0 的行)

Another option is rowSums (if you want to remove rows that are 0 for both column 2 and 3)

 df[rowSums(df[2:3])!=0,]

df$val3[2] <- 2

将返回带有 rowSums 的所有行,而其他方法返回第 1 行和第 3 行.

will return all the rows with rowSums while the other methods return rows 1 and 3.

subset 等效的选项是 &

subset(df, !(val2==0 & val3==0))

这篇关于根据多列值获取数据子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆