加入dplyr时如何指定x和y的列名? [英] How to specify names of columns for x and y when joining in dplyr?

查看：21 发布时间：2021/12/17 20:50:48 r join left-join dplyr

本文介绍了加入dplyr时如何指定x和y的列名?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有两个要使用 dplyr 连接的数据框.一个是包含名字的数据框.

I have two data frames that I want to join using dplyr. One is a data frame containing first names.

test_data <- data.frame(first_name = c("john", "bill", "madison", "abby", "zzz"),
                        stringsAsFactors = FALSE)

另一个数据框包含 Kantrowitz 姓名语料库的清理版本，用于识别性别.这是一个最小的例子:

The other data frame contains a cleaned up version of the Kantrowitz names corpus, identifying gender. Here is a minimal example:

kantrowitz <- structure(list(name = c("john", "bill", "madison", "abby", "thomas"), gender = c("M", "either", "M", "either", "M")), .Names = c("name", "gender"), row.names = c(NA, 5L), class = c("tbl_df", "tbl", "data.frame"))

我基本上想使用 kantrowitz 表从 test_data 表中查找姓名的性别.因为我要把它抽象成一个函数encode_gender，我不知道数据集中将要使用的列的名称，所以我不能保证它会是 name，如 kantrowitz$name.

I essentially want to look up the gender of the name from the test_data table using the kantrowitz table. Because I'm going to abstract this into a function encode_gender, I won't know the name of the column in the data set that's going to be used, and so I can't guarantee that it will be name, as in kantrowitz$name.

在基础 R 中，我会以这种方式执行合并:

In base R I would perform the merge this way:

merge(test_data, kantrowitz, by.x = "first_names", by.y = "name", all.x = TRUE)

返回正确的输出:

  first_name gender
1       abby either
2       bill either
3       john      M
4    madison      M
5        zzz   <NA>

但我想在 dplyr 中执行此操作，因为我正在使用该包进行所有其他数据操作.各种 *_join 函数的 dplyr by 选项只允许我指定一个列名，但我需要指定两个.我正在寻找这样的东西:

But I want to do this in dplyr because I'm using that package for all my other data manipulation. The dplyr by option to the various *_join functions only lets me specify one column name, but I need to specify two. I'm looking for something like this:

library(dplyr)
# either
left_join(test_data, kantrowitz, by.x = "first_name", by.y = "name")
# or
left_join(test_data, kantrowitz, by = c("first_name", "name"))

使用 dplyr 执行这种连接的方法是什么?

What is the way to perform this kind of join using dplyr?

(没关系，Kantrowitz 语料库是一种识别性别的糟糕方式.我正在研究一个更好的实现，但我想先让这个工作.)

(Never mind that the Kantrowitz corpus is a bad way to identify gender. I'm working on a better implementation, but I want to get this working first.)

加入dplyr时如何指定x和y的列名? [英] How to specify names of columns for x and y when joining in dplyr?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

加入dplyr时如何指定x和y的列名? [英] How to specify names of columns for x and y when joining in dplyr?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭