独立于列数,在数据帧上按行应用函数 [英] Apply a function by row on a dataframe independently of the number of columns
问题描述
我想在数据框架上按行应用一个函数,以根据行中的值连接列标题。
I'd like to apply a function by rows on a data.frame to concatenate column titles depending on the value in the row.
df
A B
1 TRUE TRUE
2 FALSE TRUE
3 FALSE FALSE
A B Result
1 TRUE TRUE A / B
2 FALSE TRUE B
3 FALSE FALSE NA
我阅读关于dplyr使用mutate()和rowwise(),但我不知道如何应用它们,因为列不是常量。
I read about dplyr using mutate() and rowwise(), but I don't know how to apply them since the columns aren't constants.
一行我我会做一些事情:
for a row "i" I would do something like:
paste(names(df)[as.logical(df[i,])], collapse = ' / ')
欢迎任何帮助。
谢谢。
推荐答案
如果数据集不是很大(即数百万/十亿行),我们可以使用应用
与 MARGIN = 1
循环遍历行,将名称
矢量使用逻辑矢量
作为索引和粘贴
它们在一起。
If the dataset is not really big (i.e. in millions/billions of rows) we can use apply
with MARGIN=1
to loop over the rows, subset the names
of the vector using the logical vector
as index and paste
them together. It is easier to code in a single line.
df$Result <- apply(df, 1, FUN = function(x) paste(names(x)[x], collapse=" / "))
但是,如果我们有一个大数据集,另一个选项是创建一个键/值对,并通过匹配替换值,并且比上述解决方案更快。
However, if we have a big dataset, another option is to create a key/value pair and replace the values by matching and it is faster than the above solution.
v1 <- do.call(paste, df)
unname(setNames(c("A / B", "B", "A", NA), do.call(paste,
expand.grid(rep(list(c(TRUE, FALSE)), 2))))[v1])
#[1] "A / B" "B" NA
或者我们可以使用算术运算来执行
Or we can use arithmetic operation to do this
c(NA, "A", "B", "A / B")[1 + df[,1] + 2 * df[,2]]
#[1] "A / B" "B" NA
基准
使用@ DavidArenburg的数据集并包含这里发布的两个解决方案(将df的列名称更改为 A'和'B')
Benchmarks
Using @DavidArenburg's dataset and including the two solutions posted here (changed the column names of 'df' to 'A' and 'B')
newPaste <- function(df) {
v1 <- do.call(paste, df)
unname(setNames(c("A / B", "B", "A", NA), do.call(paste,
expand.grid(rep(list(c(TRUE, FALSE)), 2))))[v1])
}
arith <- function(df){
c(NA, "A", "B", "A / B")[1 + df[,1] + 2 * df[,2]]
}
microbenchmark::microbenchmark(Rowwise(df), Colwise(df), newPaste(df),arith(df))
#Unit: milliseconds
# expr min lq mean median uq max neval
# Rowwise(df) 398.024791 453.68129 488.07312 481.051431 523.466771 688.36084 100
# Colwise(df) 25.361609 28.10300 34.20972 30.952365 35.885061 95.92575 100
# newPaste(df) 65.777304 69.07432 82.08602 71.606890 82.232980 176.66516 100
# arith(df) 1.790622 1.88339 4.74913 2.027674 4.753279 58.50942 100
这篇关于独立于列数,在数据帧上按行应用函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!