将值从分类更改为标称值 [英] Change values from categorical to nominal in R
问题描述
我想按等级更改分类列中的所有值.可以使用列中已排序的唯一元素的索引来确定排名.
I want to change all the values in categorical columns by rank. Rank can be decided using the index of the sorted unique elements in the column.
例如,
> data[1:5,1]
[1] "B2" "C4" "C5" "C1" "B5"
然后我希望这些列中的条目替换分类值
then I want these entries in the column replacing categorical values
> data[1:5,1]
[1] "1" "4" "5" "3" "2"
另一列:
> data[1:5,3]
[1] "Verified" "Source Verified" "Not Verified" "Source Verified" "Source Verified"
然后更新列:
> data[1:5,3]
[1] "3" "2" "1" "2" "2"
我将此代码用于此任务,但要花很多时间.
I used this code for this task but it is taking a lot of time.
for(i in 1:ncol(data)){
if(is.character(data[,i])){
temp <- sort(unique(data[,i]))
for(j in 1:nrow(data)){
for(k in 1:length(temp)){
if(data[j,i] == temp[k]){
data[j,i] <- k}
}
}
}
}
如果可能的话,请向我建议有效的方法.谢谢.
Please suggest me the efficient way to do this, if possible. Thanks.
推荐答案
在 base
R中提供解决方案.我创建了一个辅助函数,该函数使用其唯一的排序值作为级别将每列转换为一个因子.除了我使用 as.integer
获取排名值外,这与您所做的类似.
Here a solution in base
R. I create a helper function that convert each column to a factor using its unique sorted values as levels. This is similar to what you did except I use as.integer
to get the ranking values.
rank_fac <- function(col1)
as.integer(factor(col1,levels = unique(col1)))
一些数据示例:
dx <- data.frame(
col1= c("B2" ,"C4" ,"C5", "C1", "B5"),
col2=c("Verified" , "Source Verified", "Not Verified" , "Source Verified", "Source Verified")
)
应用它而不使用for循环.最好在这里使用 lapply
以避免副作用.
Applying it without using a for loop. Better to use lapply
here to avoid side-effect.
data.frame(lapply(dx,rank_fac)
结果:
# col1 col2
# [1,] 1 3
# [2,] 4 2
# [3,] 5 1
# [4,] 3 2
# [5,] 2 2
使用data.table语法糖
library(data.table)
setDT(dx)[,lapply(.SD,rank_fac)]
# col1 col2
# 1: 1 3
# 2: 4 2
# 3: 5 1
# 4: 3 2
# 5: 2 2
更简单的解决方案:
仅使用 as.integer
:
setDT(dx)[,lapply(.SD,as.integer)]
这篇关于将值从分类更改为标称值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!