R函数按列分类? [英] R function categorize by column?
问题描述
我想编写一个函数,它接收一个数据帧,计算出多个列的出现次数,然后根据列名出现为该行分配一个Category。
以这个df为例:
df < - data.frame(k1 = c(0,0, 3,4,5,1),
k2 = c(1,0,0,4,5,0),
k3 = c(0,0,0,8,0,0) ,
k4 = c(2,5,0,3,4,5))
我希望输出如下所示:
df.final <-data.frame(k1 = c(0 ,0,3,4,5,1),
k2 = c(1,0,0,4,5,0),
k3 = c(0,0,0,8,0 ,0),
k4 = c(2,5,0,3,4,5),
Category = c(k2_k4,k4,k1,k1_k2_k3_k4, k1_k2_k4,k1_k4))
当然,我的实际数据有很多很多,我希望这个函数可以用来评估任意数量的列的数据帧。我只是不确定如何编写函数。我可以使用 data.table :: transpose()函数来编写新手!
sapply
遍历整个列表并粘贴相应的列名称,其值不为零: df $ category = sapply(data.table :: transpose(df),
function(r)paste0(names( )
df
#k1 k2 k3 k4类别
#1 0 1 0 2 k2_k4
#2 0 0 0 5 k4
#3 3 0 0 0 k1
#4 4 4 8 3 k1_k2_k3_k4
#5 5 5 0 4 k1_k2_k4
#6 1 0 0 5 k1_k4
I would like to write a function that takes a data frame, counts occurrences across multiple columns, and then assigns the row with a "Category" based on column name occurrence.
Taking this df as an example:
df <- data.frame(k1 = c(0,0,3,4,5,1),
k2 = c(1,0,0,4,5,0),
k3 = c(0,0,0,8,0,0),
k4 = c(2,5,0,3,4,5))
I'd like the output to look like this:
df.final<-data.frame(k1 = c(0,0,3,4,5,1),
k2 = c(1,0,0,4,5,0),
k3 = c(0,0,0,8,0,0),
k4 = c(2,5,0,3,4,5),
Category = c("k2_k4","k4","k1","k1_k2_k3_k4","k1_k2_k4","k1_k4"))
Of course, my actual data is many, many more lines and I'm hoping this function can be used to evaluate data frames with any number of columns. I'm just not sure how to write the function. I'm a function writing newbie!
You can use data.table::transpose()
function to make each row a vector, then use sapply
to loop through the list and paste corresponding column names where the values are not zero:
df$category = sapply(data.table::transpose(df),
function(r) paste0(names(df)[r != 0], collapse = "_"))
df
# k1 k2 k3 k4 category
#1 0 1 0 2 k2_k4
#2 0 0 0 5 k4
#3 3 0 0 0 k1
#4 4 4 8 3 k1_k2_k3_k4
#5 5 5 0 4 k1_k2_k4
#6 1 0 0 5 k1_k4
这篇关于R函数按列分类?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!