将数据类型分配给R中数据框的每一列 [英] Assign data types to each column of a data frame in R
问题描述
让我们说我有两个大的数据帧,其中一个我将数据存储为字符,另外一个我为给定数据帧指定每一列的数据类型。
Let us say I have two large data frames, one for which I store the data as character and one for which I specify the data type of each of the columns for the given data frame.
例如:
my.df = data.frame(
id = c('122','345', '43'),
name = c('john','matt','roger'),
race = c('1','2','1'),
age = c('20','23','34'),
height = c('6.4', '5.7', '4.9')
)
cols.of.my.df.type.df = data.frame(
col.name.in.my.df = c('id','name', 'race', 'age', 'height',
c('string', 'string', 'integer, encoded value', 'integer', 'decimal')
)
cols.of.my.df.type中的类型
与R中的不同,但我也在寻找建议,其中我也应指定R数据类型。
The type in cols.of.my.df.type
is not the same as in R, but I am also looking for recommendations in which R data type I should also specify the columns.
一种将 my.df
的数据类型转换为 cols.of.my.df.type $ c $中指定的数据类型的快速方法c>?
Is there a fast way to transform the data types of my.df
to the ones specified in cols.of.my.df.type
?
推荐答案
使用您的数据:
df <- data.frame(
id = c('122','345', '43'),
name = c('john','matt','roger'),
race = c('1','2','1'),
age = c('20','23','34'),
height = c('6.4', '5.7', '4.9'),
stringsAsFactors = FALSE
)
cols <- data.frame(
name = c('id','name', 'race', 'age', 'height'),
type = c('string', 'string', 'integer, encoded value', 'integer', 'decimal'),
stringsAsFactors = FALSE)
这里是一个假设您按照上面的步骤进行操作的方式。请注意,在上面的定义中, stringsAsFactors = FALSE
很重要。
Here is one way to do what you want, assuming a set-up as per the above. Note that stringsAsFactors = FALSE
is important in the above definitions for this to work.
foo <- function(i, data, colInfo) {
## mapping your types to R's types
RTypes <- c(string = "character", `integer, encoded value` = "factor",
integer = "integer", decimal = "double")
## get current type
TYPE <- colInfo$type[i]
## match this against the mapping vector
RTYPE <- RTypes[TYPE]
## if a factor coerce via as.factor
if (RTYPE == "factor") {
out <- as.factor(data[, i])
} else { ## otherwise convert via storage.mode()
out <- data[,i]
storage.mode(out) <- RTYPE
}
out # return
}
tmp <- lapply(seq_len(nrow(cols)), foo, df, cols)
names(tmp) <- names(df)
tmp <- data.frame(tmp, stringsAsFactors = FALSE)
tmp
str(tmp)
哪个给出:
> tmp
id name race age height
1 122 john 1 20 6.4
2 345 matt 2 23 5.7
3 43 roger 1 34 4.9
> str(tmp)
'data.frame': 3 obs. of 5 variables:
$ id : chr "122" "345" "43"
$ name : chr "john" "matt" "roger"
$ race : Factor w/ 2 levels "1","2": 1 2 1
$ age : int 20 23 34
$ height: num 6.4 5.7 4.9
这篇关于将数据类型分配给R中数据框的每一列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!