比较R中的2个数据集 [英] Comparing 2 datasets in R

查看:192
本文介绍了比较R中的2个数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从名为babies2009(3个向量计数,名称,性别)的数据集中提取了2个数据集



一个是girls2009包含所有女孩和其他男孩。
我想知道男孩和女孩之间有什么相似的名字。



我试过这个

  common.names =(boys2009 $ name%in%girls2009 $ name)


$ b b

当我尝试

  babies2009 [common.names,] [1:10,] 



所有我得到的是女孩的名字,而不是通用名。



  boys2009 [1: 10,] 
girsl2009 [1:10,]

数据集并确定它们共享的值。
感谢,

解决方案

common.names =(boys2009 $ name%in%girls2009 $ name)给你一个长度 length(boys2009 $ name)的逻辑向量。所以,当你尝试从更长的data.frame babies2009 [common.names,] [1:10,] 中选择时,你会用废话来结束。



解决方案:在正确的data.frame上使用逻辑向量。

  boys2009< -  data.frame(names = c(Billy,Bob),data = runif(2),gender =M,stringsAsFactors = FALSE)
girls2009< (Billy,Mae,Sue),data = runif(3),gender =F,stringsAsFactors = FALSE)
babies2009 < - rbind(boys2009,girls2009)

common.names< - (boys2009 $ name%in%girls2009 $ name)

> boys2009 [common.names,] $ names
[1]Billy


I have 2 extracted data sets from a dataset called babies2009( 3 vectors count, name, gender )

One is girls2009 containing all the girls and the other boys2009. I want to find out what similar names exist between boys and girls.

I tried this

common.names = (boys2009$name %in% girls2009$name)

When I try

babies2009[common.names, ] [1:10, ]

all I get is the girl names not the common names.

I have confirmed that both data sets indeed contain boys and girls respectively by doing taking a 10 sample...

boys2009 [1:10,]
girsl2009 [1:10,]

How else can I compare the 2 datasets and determine what values they both share. Thanks,

解决方案

common.names = (boys2009$name %in% girls2009$name) gives you a logical vector of length length(boys2009$name). So when you try selecting from a much longer data.frame babies2009[common.names, ] [1:10, ], you wind up with nonsense.

Solution: use that logical vector on the proper data.frame!

boys2009 <- data.frame( names=c("Billy","Bob"),data=runif(2), gender="M" , stringsAsFactors=FALSE)
girls2009 <- data.frame( names=c("Billy","Mae","Sue"),data=runif(3), gender="F" , stringsAsFactors=FALSE)
babies2009 <- rbind(boys2009,girls2009)

common.names <- (boys2009$name %in% girls2009$name)

> boys2009[common.names, ]$names
[1] "Billy"

这篇关于比较R中的2个数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆