匹配R中的两列 [英] Matching two Columns in R

查看:101
本文介绍了匹配R中的两列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大数据集df(354903行),其中有两列分别名为df$ColumnNamedf$ColumnName.1

I have a big dataset df (354903 rows) with two columns named df$ColumnName and df$ColumnName.1

head(df)
       CompleteName       CompleteName.1
1   Lefebvre Arnaud Lefebvre Schuhl Anne
1.1 Lefebvre Arnaud              Abe Lyu
1.2 Lefebvre Arnaud              Abe Lyu
1.3 Lefebvre Arnaud       Louvet Nicolas
1.4 Lefebvre Arnaud   Muller Jean Michel
1.5 Lefebvre Arnaud  De Dinechin Florent

我正在尝试创建标签以查看名称是否相同. 当我尝试一个小的子集时,它可以工作[如果它们相同则为1,否则为0]:

I am trying to create labels to see weather the name is the same or not. When I try a small subset it works [1 if they are the same, 0 if not]:

> match(df$CompleteName[1], df$CompleteName.1[1], nomatch = 0)
[1] 0
> match(df$CompleteName[1:10], df$CompleteName.1[1:10], nomatch = 0)
[1] 0 0 0 0 0 0 0 0 0 0

但是当我抛出完整的列时,它给了我完整的不同值,这对我来说似乎毫无意义:

But as soon as I throw the complete columns, it gives me complete different values, which seem nonsense to me:

> match(df$CompleteName, df$CompleteName.1, nomatch = 0)
[1] 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101
[23] 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101
[45] 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101

我应该使用sapply吗?我没有弄清楚,我尝试了一个错误:

Should I use sapply? I did not figured it out, I tried this with an error:

 sapply(df, function(x) match(x$CompleteName, x$CompleteName.1, nomatch = 0))

请帮助!!!

推荐答案

在匹配的手册页中,

"match"返回(第一个)匹配项位置的向量 它的第一个参数在第二个参数中.

‘match’ returns a vector of the positions of (first) matches of its first argument in its second.

所以您的数据似乎表明"Lefebvre Arnaud"的第一个匹配项(第一个参数的第一个位置)在第101行中.运算符==.

So your data seem to indicate that the first match of "Lefebvre Arnaud" (the first position in the first argument) is in the row 101. I believe what you intended to do is a simple comparison, so that's just the equality operator ==.

一些示例数据:

> a <- rep ("Lefebvre Arnaud", 6)
> b <- c("Abe Lyu", "Abe Lyu", "Lefebvre Arnaud", rep("De Dinechin Florent", 3))
> x <- data.frame(a,b, stringsAsFactors=F)
> x
            a                   b
1 Lefebvre Arnaud             Abe Lyu
2 Lefebvre Arnaud             Abe Lyu
3 Lefebvre Arnaud     Lefebvre Arnaud
4 Lefebvre Arnaud De Dinechin Florent
5 Lefebvre Arnaud De Dinechin Florent
6 Lefebvre Arnaud De Dinechin Florent
> x$a == x$b
[1] FALSE FALSE  TRUE FALSE FALSE FALSE

编辑:此外,您还需要确保将各种苹果进行比较,因此请仔细检查列的数据类型.使用str(df)查看列是字符串还是因子.您可以使用"stringsAsFactors = FALSE"构造矩阵,也可以将因数转换为字符.有几种方法可以执行此操作,请在此处检查:转换数据.从要素到字符的框架列

Also, you need to make sure that you are comparing apples to apples, so double check the data type of your columns. Use str(df) to see whether the columns are strings or factors. You can either construct the matrix with "stringsAsFactors = FALSE", or convert from factor to character. There are several ways to do that, check here: Convert data.frame columns from factors to characters

这篇关于匹配R中的两列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆