循环或函数来比较两个列值,并在R中创建新变量 [英] loop or function to compare two column values and create new variable in R

查看:170
本文介绍了循环或函数来比较两个列值,并在R中创建新变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个大小的数据框(实际数据集非常大!)。以下只是为了工作。

  big<  -  data.frame(SN = 1:5,names = c(A,B C,D,E),var = 51:55)

SN名称var
1 1 A 51
2 2 B 52
3 3 C 53
4 4 D 54
5 5 E 55

small< - data.frame(names = c(A,C,E ),type = c(New,Old,Old))
名称类型
1 A新
2 C旧
3 E旧

现在我需要在big中创建和新变量,并在type变量的帮助下。小和大的名称将匹配,相应的类型将存储在列类型中。如果名称列之间没有匹配,则会导致新值unknown。预期输出如下:

  resultdf<  -  data.frame(SN = 1:5,names = c A,B,C,D,E),var = 51:55,
type = c(New,Unknown,Old 旧))

resultdf
SN名称var类型
1 1 A 51新
2 2 B 52未知
3 3 C 53旧
4 4 D 54未知
5 5 E 55旧

我知道这是专家的简单问题,但我无法想像。

解决方案

首先使用 merge() c> all = TRUE 合并两个data.frames,保持中没有匹配值的行小$名称。然后,替换没有找到匹配的 big $ type 的元素,标记为 merge()与 NAs)与字符串未知



请注意,因为 big small 共享一个列名称,该列默认用于执行合并。要更好地控制哪些列用作合并的基础,请参阅, by.x by.y 参数中的函数

  small<  -  data.frame(names = c(A,C,E), 
type = c(New,Old,Old),stringsAsFactors = FALSE)
big< - data.frame(SN = 1:5,names = c(A ,B,C,D,E),var = 51:55,
stringsAsFactors = FALSE)

big& all = TRUE)
big $ type [is.na(big $ type)]< - 未知


I have two big and small dataframes (actually dataset is very very big !). The following just for working.

big  <- data.frame (SN = 1:5, names = c("A", "B", "C", "D", "E"), var = 51:55)

 SN names var
1  1     A  51
2  2     B  52
3  3     C  53
4  4     D  54
5  5     E  55

small <- data.frame (names = c("A", "C", "E"), type = c("New", "Old", "Old") )
  names type
1     A  New
2     C  Old
3     E  Old

Now I need to create and new variable in "big" with the help of "type" variable in small. The names in small and big will match and corresponding type will be stored in column type. If there is no match between the names columns it will be result in new value "unknown". The expected output is as follows:

resultdf <- data.frame(SN = 1:5, names = c("A", "B", "C", "D", "E"), var = 51:55, 
              type = c("New","Unknown", "Old", "Unknown", "Old"))

resultdf 
  SN names var    type
1  1     A  51     New
2  2     B  52 Unknown
3  3     C  53     Old
4  4     D  54 Unknown
5  5     E  55     Old

I know this is simple question for experts but I could not figure it out.

解决方案

First use merge() with the argument all=TRUE to merge the two data.frames, keeping rows of big that found no matching value in the small$names. Then, replace those elements of big$type that didn't find a match (marked by merge() with "NA"s) with the string "Unknown".

Note that because big and small share just one column name in common, that column is by default used to perform the merge. For more control over which columns are used as the basis of the merge, see the function's by, by.x, and by.y arguments.

small <- data.frame (names = c("A", "C", "E"), 
                     type = c("New", "Old", "Old"), stringsAsFactors=FALSE)
big  <- data.frame (SN = 1:5, names = c("A", "B", "C", "D", "E"), var = 51:55,
                    stringsAsFactors=FALSE)

big <- merge(big, small, all=TRUE)
big$type[is.na(big$type)] <- "Unknown"

这篇关于循环或函数来比较两个列值,并在R中创建新变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆