如何基于其他列的排列在数据框中创建新列? [英] How can I create a new column in a dataframe based on permutations of other columns?
问题描述
假设我有一个数据帧,如下所示:
Suppose I have a dataframe which looks like this:
var1 var2 var3 var4
a TRUE FALSE TRUE FALSE
b TRUE TRUE TRUE FALSE
c FALSE TRUE FALSE TRUE
d TRUE FALSE FALSE FALSE
e TRUE FALSE TRUE FALSE
f FALSE TRUE FALSE TRUE
我想创建一个新列,将 a
分配给 f
根据类别分别基于 TRUE
和 FALSE
的每个排列的类别。
I want to create a new column which assigns a
to f
to categories based on what permutation of TRUE
and FALSE
each has for the variables along the top.
在此简化示例中,结果如下:
In this simplified example, the result would look like:
var1 var2 var3 var4 category
a TRUE FALSE TRUE FALSE A
b TRUE TRUE TRUE FALSE B
c FALSE TRUE FALSE TRUE C
d TRUE FALSE FALSE FALSE D
e TRUE FALSE TRUE FALSE A
f FALSE TRUE FALSE TRUE C
否两次,每个 TRUE
和 FALSE
的唯一排列成为不同的类别,并且由于 a
和 e
具有相同的排列,它们以同一类别( A
)结尾。
Notice that each unique permutation of TRUE
and FALSE
becomes a different category, and since a
and e
have the same permutation, they end up in the same category (A
).
是否有一种简单的方法可以做到这一点,如果顶部有很多变量,并且不限于,则可以使用该方法是和
否,但是数据框是否填充有类别/数字?
Is there an easy way to do this, which can work if there is a large number of variables along the top, and potentially not limited to TRUE
and FALSE
but also if the dataframe was filled with categories/numbers?
推荐答案
您可以执行以下操作
## paste the rows together, creating a character vector
x <- do.call(paste, df)
## match it against itself and apply to 'LETTERS', and assign as new column
df$category <- LETTERS[match(x, x)]
df
# var1 var2 var3 var4 category
# a TRUE FALSE TRUE FALSE A
# b TRUE TRUE TRUE FALSE B
# c FALSE TRUE FALSE TRUE C
# d TRUE FALSE FALSE FALSE D
# e TRUE FALSE TRUE FALSE A
# f FALSE TRUE FALSE TRUE C
如果我们使用命名列表作为环境,那么上面的代码可以单行编写。这样可以避免对全球环境进行任何新的分配。
The above code can be written as a one-liner if we use a named list as an environment. This avoids making any new assignments to the global environment.
df$category <- LETTERS[with(list(x = do.call(paste, df)), match(x, x))]
数据:
df <- structure(list(var1 = c(TRUE, TRUE, FALSE, TRUE, TRUE, FALSE),
var2 = c(FALSE, TRUE, TRUE, FALSE, FALSE, TRUE), var3 = c(TRUE,
TRUE, FALSE, FALSE, TRUE, FALSE), var4 = c(FALSE, FALSE,
TRUE, FALSE, FALSE, TRUE)), .Names = c("var1", "var2", "var3",
"var4"), row.names = c("a", "b", "c", "d", "e", "f"), class = "data.frame")
这篇关于如何基于其他列的排列在数据框中创建新列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!