一次转换数据框多列的类型 [英] Convert type of multiple columns of a dataframe at once
问题描述
我似乎花了很多时间从文件、数据库或其他东西创建数据框,然后将每一列转换为我想要的类型(数字、因子、字符等).有没有办法一步一步做到这一点,可能是通过提供一个类型的向量?
I seem to spend a lot of time creating a dataframe from a file, database or something, and then converting each column into the type I wanted it in (numeric, factor, character etc). Is there a way to do this in one step, possibly by giving a vector of types ?
foo<-data.frame(x=c(1:10),
y=c("red", "red", "red", "blue", "blue",
"blue", "yellow", "yellow", "yellow",
"green"),
z=Sys.Date()+c(1:10))
foo$x<-as.character(foo$x)
foo$y<-as.character(foo$y)
foo$z<-as.numeric(foo$z)
而不是最后三个命令,我想做类似的事情
instead of the last three commands, I'd like to do something like
foo<-convert.magic(foo, c(character, character, numeric))
推荐答案
编辑参见 this 有关此基本思想的一些简化和扩展的相关问题.
Edit See this related question for some simplifications and extensions on this basic idea.
我使用 switch
对 Brandon 的回答的评论:
My comment to Brandon's answer using switch
:
convert.magic <- function(obj,types){
for (i in 1:length(obj)){
FUN <- switch(types[i],character = as.character,
numeric = as.numeric,
factor = as.factor)
obj[,i] <- FUN(obj[,i])
}
obj
}
out <- convert.magic(foo,c('character','character','numeric'))
> str(out)
'data.frame': 10 obs. of 3 variables:
$ x: chr "1" "2" "3" "4" ...
$ y: chr "red" "red" "red" "blue" ...
$ z: num 15254 15255 15256 15257 15258 ...
对于真正的大数据帧,您可能希望使用 lapply
而不是 for
循环:
For truly large data frames you may want to use lapply
instead of the for
loop:
convert.magic1 <- function(obj,types){
out <- lapply(1:length(obj),FUN = function(i){FUN1 <- switch(types[i],character = as.character,numeric = as.numeric,factor = as.factor); FUN1(obj[,i])})
names(out) <- colnames(obj)
as.data.frame(out,stringsAsFactors = FALSE)
}
执行此操作时,请注意在 R 中强制数据的一些复杂性.例如,从因子转换为数字通常涉及 as.numeric(as.character(...))
.另外,请注意 data.frame()
和 as.data.frame()
将字符转换为因子的默认行为.
When doing this, be aware of some of the intricacies of coercing data in R. For example, converting from factor to numeric often involves as.numeric(as.character(...))
. Also, be aware of data.frame()
and as.data.frame()
s default behavior of converting character to factor.
这篇关于一次转换数据框多列的类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!