dplyr join定义NA值 [英] dplyr join define NA values

查看：238 发布时间：2017/7/13 20:36:02 r left-join dplyr na

本文介绍了dplyr join定义NA值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我可以在dplyr join中为NA定义一个fill值吗？例如在连接中定义所有NA值应为1？

Can I define a "fill" value for NA in dplyr join? For example in the join define that all NA values should be 1?

require(dplyr)
lookup <- data.frame(cbind(c("USD","MYR"),c(0.9,1.1)))
names(lookup) <- c("rate","value")
fx <- data.frame(c("USD","MYR","USD","MYR","XXX","YYY"))
names(fx)[1] <- "rate"
left_join(x=fx,y=lookup,by=c("rate"))

以上代码将为值XXX和YYY创建NA。在我的情况下，我加入了大量的列，将会有很多不匹配。所有非匹配项应具有相同的值。我知道我可以在几个步骤中完成，但问题是可以在一个完成的吗？
谢谢！

Above code will create NA for values "XXX" and "YYY". In my case I am joining a large number of columns and there will be a lot of non-matches. All non-matches should have the same value. I know I can do it in several steps but the question is can all be done in one? Thanks!

推荐答案

首先，我建议不要使用组合 data.frame（cbind（...））。这就是为什么：如果你只传递原子向量，默认情况下， cbind 创建一个矩阵而R中的矩阵只能有一种类型的数据（将矩阵视为具有维度属性的向量，即行和列的数量）。因此，您的代码

First off, I would like to recommend not to use the combination data.frame(cbind(...)). Here's why: cbind creates a matrix by default if you only pass atomic vectors to it. And matrices in R can only have one type of data (think of matrices as a vector with dimension attribute, i.e. number of rows and columns). Therefore, your code

cbind(c("USD","MYR"),c(0.9,1.1))

创建一个字符矩阵：

str(cbind(c("USD","MYR"),c(0.9,1.1)))
# chr [1:2, 1:2] "USD" "MYR" "0.9" "1.1"

尽管您可能预期有一个字符或因子列的最终数据帧速率）和数字列（值）。但是你得到的是：

although you probably expected a final data frame with a character or factor column (rate) and a numeric column (value). But what you get is:

str(data.frame(cbind(c("USD","MYR"),c(0.9,1.1))))
#'data.frame':  2 obs. of  2 variables:
# $ X1: Factor w/ 2 levels "MYR","USD": 2 1
# $ X2: Factor w/ 2 levels "0.9","1.1": 1 2

因为使用 data.frame （可以通过在 data.frame（） stringsAsFactors = FALSE 来规避>调用）

because strings (characters) are converted to factors when using data.frame by default (You can circumvent this by specifying stringsAsFactors = FALSE in the data.frame() call).

我建议使用以下替代方法来创建示例数据（另请注意，您可以轻松地在同一个调用中指定列名称）：

I suggest the following alternative approach to create the sample data (also note that you can easily specify the column names in the same call):

lookup <- data.frame(rate = c("USD","MYR"), 
                     value = c(0.9,1.1))

fx <- data.frame(rate = c("USD","MYR","USD","MYR","XXX","YYY"))

现在，对于您的实际问题，如果我理解正确，您要替换所有 NA s，在加入的数据中有一个 1 。如果这是正确的，这里是一个使用 left_join 和 mutate_each 的自定义函数：

Now, for you actual question, if I understand correctly, you want to replace all NAs with a 1 in the joined data. If that's correct, here's a custom function using left_join and mutate_each to do that:

library(dplyr)
left_join_NA <- function(x, y, ...) {
  left_join(x = x, y = y, by = ...) %>% 
    mutate_each(funs(replace(., which(is.na(.)), 1)))
}

现在您可以将其应用于您的数据：

Now you can apply it to your data like this:

> left_join_NA(x = fx, y = lookup, by = "rate")
#  rate value
#1  USD   0.9
#2  MYR   1.1
#3  USD   0.9
#4  MYR   1.1
#5  XXX   1.0
#6  YYY   1.0
#Warning message:
#joining factors with different levels, coercing to character vector

请注意，最终得到一个字符列（rate）和一个数字列（值），所有的NAs都被替换为1。

Note that you end up with a character column (rate) and a numeric column (value) and all NAs are replaced by 1.

str(left_join_NA(x = fx, y = lookup, by = "rate"))
#'data.frame':  6 obs. of  2 variables:
# $ rate : chr  "USD" "MYR" "USD" "MYR" ...
# $ value: num  0.9 1.1 0.9 1.1 1 1

这篇关于dplyr join定义NA值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

dplyr join定义NA值 [英] dplyr join define NA values

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

dplyr join定义NA值 [英] dplyr join define NA values

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭