如何用两列条件来过滤数据框? [英] how to filter data frame with conditions of two columns?

查看:125
本文介绍了如何用两列条件来过滤数据框?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从数据框中选择。问题是为什么我最后一个查询返回所有5条记录,而不是前两个jsut?

 > x<  -  c(5,1,3,2,4)
> y< - c(1,5,3,4,2)
>数据< - data.frame(x,y)
>数据
x y
1 5 1
2 1 5
3 3 3
4 2 4
5 4 2
> data [data $ x> 4 ||数据$ y> 4]
xy
1 5 1
2 1 5
3 3 3
4 2 4
5 4 2


解决方案

(1)对于选择数据(子集),我强烈建议 函数由Hadley Wickhm编写的 plyr 包,它更简单易用:

  library(plyr)
subset(data,x> 4 | y> 4)

更新:



有一个较新版本的 plyr dplyr Here ),这也是从Hadley,但据推测方式更快更容易使用。如果您曾经看过像%。%%>%这样的操作,你知道他们正在链接操作 dplyr

  result<  -  data%>%
filter(x> 4 | y> 4)#NOTE过滤器(condition1,condition2 ..)。

(2)确实存在 | ||



您可以通过以下方式查看帮助手册:?'|'



较短的表单以与算术运算符完全相同的方式执行元素比较。较长的形式从左到右评估仅检查每个向量的第一个元素。评估仅在结果确定之前进行。较长的表单适用于编程控制流程,通常在if子句中优先。

 > c(1,1,0)| c(0,0,0)
[1] TRUE TRUE FALSE
> c(1,1,0)|| c(0,0,0)
[1] TRUE

根据您的问题,你基本上是 data [TRUE] ,这将返回完整的数据框。


I am trying to select from a data frame. The question is why I the last query below returns all 5 records not jsut the first two?

> x <- c(5,1,3,2,4)
> y <- c(1,5,3,4,2)
> data <- data.frame(x,y)
> data
  x y
1 5 1
2 1 5
3 3 3
4 2 4
5 4 2
> data[data$x > 4 || data$y > 4]
  x y
1 5 1
2 1 5
3 3 3
4 2 4
5 4 2

解决方案

(1) For select data (subset), I highly recommend subset function from plyr package written by Hadley Wickhm, it is cleaner and easy to use:

library(plyr)
subset(data, x > 4 | y > 4)

UPDATE:

There is a newer version of plyr called dplyr(Here) which is also from Hadley, but supposedly way faster and easier to use. If you have ever seen operatior like %.% or %>%, you know they are chaining the operations using dplyr.

result <- data %>%
          filter(x>4 | y>4)  #NOTE filter(condition1, condition2..) for AND operators.

(2) There indeed exist some differences between | and ||:

You can look at the help manual by doing this: ?'|'

The shorter form performs elementwise comparisons in much the same way as arithmetic operators. The longer form evaluates left to right examining only the first element of each vector. Evaluation proceeds only until the result is determined. The longer form is appropriate for programming control-flow and typically preferred in if clauses.

> c(1,1,0) | c(0,0,0)
[1]  TRUE  TRUE FALSE
> c(1,1,0) || c(0,0,0)
[1] TRUE

Per your question, what you did is basically data[TRUE], which ...will return the complete dataframe.

这篇关于如何用两列条件来过滤数据框?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆