加入dplyr时如何指定x和y列的名称？ [英] How to specify names of columns for x and y when joining in dplyr?

查看：249 发布时间：2017/7/13 20:27:47 r join left-join dplyr

本文介绍了加入dplyr时如何指定x和y列的名称？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有两个数据框，我想使用dplyr加入。一个是包含名字的数据框。

  test_data<  -  data.frame（first_name = c（john bill，madison，abby，zzz），
 stringsAsFactors = FALSE）

另一个数据框架包含一个清除版本的Kantrowitz名称语料库，用于识别性别。这是一个最小的例子：

  kantrowitz<  -  structure（list（name = c（john madison，abby，thomas），gender = c（M，or，M，any，M）），.Names = c（name ），row.names = c（NA，5L），class = c（tbl_df，tbl，data.frame））

我本来想从 test_data 表中使用 kantrowitz 表。因为我要将它抽象为一个函数 encode_gender ，我不知道将要使用的数据集中的列的名称，所以我可以不保证它将是名称，如 kantrowitz $ name 。

在基地RI将以这种方式执行合并：

  merge（test_data，kantrowitz，by.x = first_names，by.y =name，all.x = TRUE）

正确输出：

  first_name gender 
 1 abby或
 2 bill 
 3 john M 
 4 madison M 
 5 zzz< NA>

但是我想在dplyr中执行此操作，因为我使用该包来处理所有其他数据操作。各种 * _ join 函数中的选项的dplyr 只允许我指定一个列名，但是我需要指定两个。我正在寻找这样的东西：

  library（dplyr）
＃或
 left_join（test_data ，kantrowitz，by.x =first_name，by.y =name）
＃或
 left_join（test_data，kantrowitz，by = c（first_name，name））

使用dplyr执行此类连接的方式是什么？

 
 
 （没关系，Kantrowitz语料库是识别性别的一个坏方法，我正在努力实现更好的实现，但是我想要首先工作。）
解决方案

此功能已添加到dplyr v0.3中。您现在可以通过 left_join （和其他加入函数）中的参数将命名的字符向量传递给，以指定哪些列在每个数据框架中加入。根据原始问题给出的例子，代码将是：

  left_join（test_data，kantrowitz，by = c（first_name =name））

I have two data frames that I want to join using dplyr. One is a data frame containing first names.

test_data <- data.frame(first_name = c("john", "bill", "madison", "abby", "zzz"),
                        stringsAsFactors = FALSE)

The other data frame contains a cleaned up version of the Kantrowitz names corpus, identifying gender. Here is a minimal example:

kantrowitz <- structure(list(name = c("john", "bill", "madison", "abby", "thomas"), gender = c("M", "either", "M", "either", "M")), .Names = c("name", "gender"), row.names = c(NA, 5L), class = c("tbl_df", "tbl", "data.frame"))

I essentially want to look up the gender of the name from the test_data table using the kantrowitz table. Because I'm going to abstract this into a function encode_gender, I won't know the name of the column in the data set that's going to be used, and so I can't guarantee that it will be name, as in kantrowitz$name.

In base R I would perform the merge this way:

merge(test_data, kantrowitz, by.x = "first_names", by.y = "name", all.x = TRUE)

That returns the correct output:

  first_name gender
1       abby either
2       bill either
3       john      M
4    madison      M
5        zzz   <NA>

But I want to do this in dplyr because I'm using that package for all my other data manipulation. The dplyr by option to the various *_join functions only lets me specify one column name, but I need to specify two. I'm looking for something like this:

library(dplyr)
# either
left_join(test_data, kantrowitz, by.x = "first_name", by.y = "name")
# or
left_join(test_data, kantrowitz, by = c("first_name", "name"))

What is the way to perform this kind of join using dplyr?

(Never mind that the Kantrowitz corpus is a bad way to identify gender. I'm working on a better implementation, but I want to get this working first.)

解决方案

This feature has been added in dplyr v0.3. You can now pass a named character vector to the by argument in left_join (and other joining functions) to specify which columns to join on in each data frame. With the example given in the original question, the code would be:

left_join(test_data, kantrowitz, by = c("first_name" = "name"))

这篇关于加入dplyr时如何指定x和y列的名称？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

加入dplyr时如何指定x和y列的名称？ [英] How to specify names of columns for x and y when joining in dplyr?

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

加入dplyr时如何指定x和y列的名称？ [英] How to specify names of columns for x and y when joining in dplyr?

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭