如何过滤R中的数据？ [英] How to filter data in R?

查看：455 发布时间：2017/11/8 19:53:15 r filter social-networking graph-theory

本文介绍了如何过滤R中的数据？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有庞大的数据集，其中包含数百万行，并有一些特殊的属性。我需要过滤保留其他属性的数据。

我的数据如下所示：

  ID Prop1 Prop2 TotalProp 
 56891940 G02 G02 2 
 56892558 A61 G02 4 
 56892558 A61 A61 4 
 56892558 G02 A61 4 
 56892558 A61 A61 4 
 56892552 B61 B61 3 
 56892552 B61 B61 3 
 56892552 B61 A61 3 
 56892559 B61 G61 3 
 56892559 B61 B61 3 
 56892559 B61 B61 3和所以超过百万行

我想要的是，如果所有的行ID都有56891940和56892559其中有prop1和prop2相同，但不是56892558和56892559，因为有些行是相同的，但至少有一个属性是不同的，所以我想保留56892558,56892552和56892559等所有值。

我的最终输出应该如下所示：

ID Prop1 Prop2 TotalProp 56892558 A61 G02 4 56892558 A61 A61 4 56892558 G02 A61 4 56892558 A61 A61 4 56892552 B61 B61 3 56892552 B61 B61 3 56892552 B61 A61 3 56892559 B61 G61 3 56892559 B61 C61 3 56892559 B61 B61 3

解决方案
您可以尝试

library（data。表） setDT（df1）[，.SD [any（Prop1！= Prop2）]，ID] ＃ID Prop1 Prop2 TotalProp ＃1：56892558 A61 G02 4 ＃2：56892558 A61 A61 4 ＃3：56892558 G02 A61 4 ＃4：56892558 A61 A61 4 ＃5：56892552 B61 B61 3 ＃6：56892552 B61 B613 ＃7：56892552 B61 A61 3 ＃8：56892559 B61 G61 3 ＃9：56892559 B61 B61 3 ＃10：56892559 B61 B61 3
或者@Frank建议

setDT（df1）[，if（any（Prop1！= Prop2））.SD，ID]

类似的选项使用 dplyr
library（dplyr） df1％>％ group_by（ID）％>％ filter（any（Prop1！= Prop2））
或者使用 ave 从 base R

$ $ $ $ $ $ $ $ $ $ $ $ $ c $ d $ >

I have huge data sets which contains more than millions of rows and has some peculiar attributes. I need to filter the data retaining its other properties.

My data is as like following:
ID Prop1 Prop2 TotalProp 56891940 G02 G02 2 56892558 A61 G02 4 56892558 A61 A61 4 56892558 G02 A61 4 56892558 A61 A61 4 56892552 B61 B61 3 56892552 B61 B61 3 56892552 B61 A61 3 56892559 B61 G61 3 56892559 B61 B61 3 56892559 B61 B61 3 and so on more than million rows
What I want is, I need to remove rows if all rows ID having 56891940 and 56892559 which have "prop1" and "prop2" same but not 56892558 and 56892559 because some rows are same but at least one of its properties are different so I want to retain all values from 56892558,56892552 and 56892559 and so on.

My final output should look like:
ID Prop1 Prop2 TotalProp 56892558 A61 G02 4 56892558 A61 A61 4 56892558 G02 A61 4 56892558 A61 A61 4 56892552 B61 B61 3 56892552 B61 B61 3 56892552 B61 A61 3 56892559 B61 G61 3 56892559 B61 C61 3 56892559 B61 B61 3

解决方案
You may try
library(data.table) setDT(df1)[, .SD[any(Prop1!=Prop2)], ID] # ID Prop1 Prop2 TotalProp # 1: 56892558 A61 G02 4 # 2: 56892558 A61 A61 4 # 3: 56892558 G02 A61 4 # 4: 56892558 A61 A61 4 # 5: 56892552 B61 B61 3 # 6: 56892552 B61 B61 3 # 7: 56892552 B61 A61 3 # 8: 56892559 B61 G61 3 # 9: 56892559 B61 B61 3 #10: 56892559 B61 B61 3
Or as @Frank suggested
setDT(df1)[, if(any(Prop1!=Prop2)) .SD, ID]
Similar option using dplyr
library(dplyr) df1 %>% group_by(ID) %>% filter(any(Prop1!=Prop2))
Or using ave from base R
df1[with(df1, ave(Prop1!=Prop2, ID, FUN=any)),]

这篇关于如何过滤R中的数据？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何过滤R中的数据？ [英] How to filter data in R?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何过滤R中的数据？ [英] How to filter data in R?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭