循环或函数来比较两个列值,并在R中创建新变量 [英] loop or function to compare two column values and create new variable in R
问题描述
big< - data.frame(SN = 1:5,names = c(A,B C,D,E),var = 51:55)
SN名称var
1 1 A 51
2 2 B 52
3 3 C 53
4 4 D 54
5 5 E 55
small< - data.frame(names = c(A,C,E ),type = c(New,Old,Old))
名称类型
1 A新
2 C旧
3 E旧
现在我需要在big中创建和新变量,并在type变量的帮助下。小和大的名称将匹配,相应的类型将存储在列类型中。如果名称列之间没有匹配,则会导致新值unknown。预期输出如下:
resultdf< - data.frame(SN = 1:5,names = c A,B,C,D,E),var = 51:55,
type = c(New,Unknown,Old 旧))
resultdf
SN名称var类型
1 1 A 51新
2 2 B 52未知
3 3 C 53旧
4 4 D 54未知
5 5 E 55旧
我知道这是专家的简单问题,但我无法想像。
首先使用 merge()
c> all = TRUE 合并两个data.frames,保持中
中没有匹配值的行小$名称
。然后,替换没有找到匹配的 big $ type
的元素,标记为 merge()
与 NAs)与字符串未知。
请注意,因为 big
和 small
共享一个列名称,该列默认用于执行合并。要更好地控制哪些列用作合并的基础,请参阅, by.x 和 by.y 参数中的函数
small< - data.frame(names = c(A,C,E),
type = c(New,Old,Old),stringsAsFactors = FALSE)
big< - data.frame(SN = 1:5,names = c(A ,B,C,D,E),var = 51:55,
stringsAsFactors = FALSE)
big& all = TRUE)
big $ type [is.na(big $ type)]< - 未知
I have two big and small dataframes (actually dataset is very very big !). The following just for working.
big <- data.frame (SN = 1:5, names = c("A", "B", "C", "D", "E"), var = 51:55)
SN names var
1 1 A 51
2 2 B 52
3 3 C 53
4 4 D 54
5 5 E 55
small <- data.frame (names = c("A", "C", "E"), type = c("New", "Old", "Old") )
names type
1 A New
2 C Old
3 E Old
Now I need to create and new variable in "big" with the help of "type" variable in small. The names in small and big will match and corresponding type will be stored in column type. If there is no match between the names columns it will be result in new value "unknown". The expected output is as follows:
resultdf <- data.frame(SN = 1:5, names = c("A", "B", "C", "D", "E"), var = 51:55,
type = c("New","Unknown", "Old", "Unknown", "Old"))
resultdf
SN names var type
1 1 A 51 New
2 2 B 52 Unknown
3 3 C 53 Old
4 4 D 54 Unknown
5 5 E 55 Old
I know this is simple question for experts but I could not figure it out.
First use merge()
with the argument all=TRUE
to merge the two data.frames, keeping rows of big
that found no matching value in the small$names
. Then, replace those elements of big$type
that didn't find a match (marked by merge()
with "NA"s) with the string "Unknown".
Note that because big
and small
share just one column name in common, that column is by default used to perform the merge. For more control over which columns are used as the basis of the merge, see the function's by, by.x, and by.y arguments.
small <- data.frame (names = c("A", "C", "E"),
type = c("New", "Old", "Old"), stringsAsFactors=FALSE)
big <- data.frame (SN = 1:5, names = c("A", "B", "C", "D", "E"), var = 51:55,
stringsAsFactors=FALSE)
big <- merge(big, small, all=TRUE)
big$type[is.na(big$type)] <- "Unknown"
这篇关于循环或函数来比较两个列值,并在R中创建新变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!