将数据类型分配给R中数据框的每一列 [英] Assign data types to each column of a data frame in R

查看:50
本文介绍了将数据类型分配给R中数据框的每一列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

让我们说我有两个大的数据帧,其中一个我将数据存储为字符,另外一个我为给定数据帧指定每一列的数据类型。

Let us say I have two large data frames, one for which I store the data as character and one for which I specify the data type of each of the columns for the given data frame.

例如:

my.df = data.frame(
    id = c('122','345', '43'), 
    name = c('john','matt','roger'), 
    race = c('1','2','1'), 
    age = c('20','23','34'), 
    height = c('6.4', '5.7', '4.9')
) 

cols.of.my.df.type.df = data.frame(
    col.name.in.my.df = c('id','name', 'race', 'age', 'height', 
    c('string', 'string', 'integer, encoded value', 'integer', 'decimal')
)

cols.of.my.df.type中的类型与R中的不同,但我也在寻找建议,其中我也应指定R数据类型。

The type in cols.of.my.df.type is not the same as in R, but I am also looking for recommendations in which R data type I should also specify the columns.

一种将 my.df 的数据类型转换为 cols.of.my.df.type

Is there a fast way to transform the data types of my.df to the ones specified in cols.of.my.df.type?

推荐答案

使用您的数据:

df <- data.frame(
    id = c('122','345', '43'), 
    name = c('john','matt','roger'), 
    race = c('1','2','1'), 
    age = c('20','23','34'), 
    height = c('6.4', '5.7', '4.9'),
    stringsAsFactors = FALSE
) 

cols <- data.frame(
    name = c('id','name', 'race', 'age', 'height'),
    type = c('string', 'string', 'integer, encoded value', 'integer', 'decimal'),
    stringsAsFactors = FALSE)

这里是一个假设您按照上面的步骤进行操作的方式。请注意,在上面的定义中, stringsAsFactors = FALSE 很重要。

Here is one way to do what you want, assuming a set-up as per the above. Note that stringsAsFactors = FALSE is important in the above definitions for this to work.

foo <- function(i, data, colInfo) {
  ## mapping your types to R's types
  RTypes <- c(string = "character", `integer, encoded value` = "factor",
              integer = "integer", decimal = "double")
  ## get current type
  TYPE <- colInfo$type[i]
  ## match this against the mapping vector
  RTYPE <- RTypes[TYPE]
  ## if a factor coerce via as.factor
  if (RTYPE == "factor") {
    out <- as.factor(data[, i])
  } else { ## otherwise convert via storage.mode()
    out <- data[,i]
    storage.mode(out) <- RTYPE
  }
  out # return
}

tmp <- lapply(seq_len(nrow(cols)), foo, df, cols)
names(tmp) <- names(df)
tmp <- data.frame(tmp, stringsAsFactors = FALSE)

tmp
str(tmp)

哪个给出:

> tmp
   id  name race age height
1 122  john    1  20    6.4
2 345  matt    2  23    5.7
3  43 roger    1  34    4.9
> str(tmp)
'data.frame':   3 obs. of  5 variables:
 $ id    : chr  "122" "345" "43"
 $ name  : chr  "john" "matt" "roger"
 $ race  : Factor w/ 2 levels "1","2": 1 2 1
 $ age   : int  20 23 34
 $ height: num  6.4 5.7 4.9

这篇关于将数据类型分配给R中数据框的每一列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆