在与 R 交互的数据库中处理字段类型 [英] Handling field types in database interaction with R

查看:26
本文介绍了在与 R 交互的数据库中处理字段类型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 RMySQL 和 MySQL 数据库来存储我的数据集.有时数据会被修改,或者我也将结果存储回数据库.长话短说,在我的用例中,R 和数据库之间有相当多的交互.

I use RMySQL and a MySQL database to store my datasets. Sometimes data gets revised or I store results back to the database as well. Long story short, there is quite some interaction between R and the database in my use case.

大多数时候我使用方便的函数,如 dbWriteTabledbReadTable 来写入和读取我的数据.不幸的是,这些完全忽略了 R 数据类型和 MySQL 字段类型.我的意思是我希望 MySQL 日期字段以 DatePOSIX 类结束.相反,我认为这些 R 类存储为某种对应的 MySQL 字段类型.这意味着日期不应该是字符 - 我不希望在这里区分浮动和双打...

Most of the time I use convenience functions like dbWriteTable and dbReadTableto write and read my data. Unfortunately these are just completely ignoring R data types and the MySQL field types. I mean I would expect that MySQL date fields end up in a Date or POSIX class. The other way around I´d think that these R classes are stored as a somewhat corresponding MySQL field type. That means a date should not be character – I do not expect to distinguish between float and doubles here...

我也尝试使用 dbGetQuery - 结果相同.在阅读手册时,我是否完全错过了某些内容,或者在这些软件包中(还)根本不可能?一个好的解决方法会怎样?

I also tried to use dbGetQuery – same result there. Is there something I have completely missed when reading the manual or is it simply not possible (yet) in these packages? What would by a nice work around?

@mdsummer我试图在文档中找到更多内容,但只找到了以下令人失望的几行:`MySQL 表作为 data.frames 读入 R,但没有将字符或逻辑数据强制转换为因子.类似地,在导出 data.frames 时,因子被导出为字符向量.

@mdsummer I tried to find something more in the documentation, but found only these disappointing lines: `MySQL tables are read into R as data.frames, but without coercing character or logical data into factors. Similarly while exporting data.frames, factors are exported as character vectors.

整数列通常作为 R 整数向量导入,但 BIGINT 或 UNSIGNED INTEGER 等情况被强制转换为 R 的双精度向量以避免截断(目前 R 的整数是有符号的 32 位量).

Integer columns are usually imported as R integer vectors, except for cases such as BIGINT or UNSIGNED INTEGER which are coerced to R's double precision vectors to avoid truncation (currently R's integers are signed 32-bit quantities).

时间变量作为字符数据导入/导出,因此您需要将它们转换为您喜欢的日期/时间表示.

Time variables are imported/exported as character data, so you need to convert these to your favorite date/time representation.

推荐答案

好的,我现在有了一个可行的解决方案.这是一个将 MySQL 字段类型映射到 R 类的函数.这特别有助于处理 MySQL 字段类型日期...

Ok, I got a working solution now. Here's a function that maps MySQL field types to R classes. This helps in particular handling the MySQL field type date...

dbReadMap <- function(con,table){
    statement <- paste("DESCRIBE ",table,sep="")
    desc <- dbGetQuery(con=con,statement)[,1:2]

  # strip row_names if exists because it's an attribute and not real column
  # otherweise it causes problems with the row count if the table has a row_names col
  if(length(grep(pattern="row_names",x=desc)) != 0){
  x <- grep(pattern="row_names",x=desc)
  desc <- desc[-x,]
  }



    # replace length output in brackets that is returned by describe
    desc[,2] <- gsub("[^a-z]","",desc[,2])

    # building a dictionary 
    fieldtypes <- c("int","tinyint","bigint","float","double","date","character","varchar","text")
    rclasses <- c("as.numeric","as.numeric","as.numeric","as.numeric","as.numeric","as.Date","as.character","as.character","as.character") 
    fieldtype_to_rclass = cbind(fieldtypes,rclasses)

    map <- merge(fieldtype_to_rclass,desc,by.x="fieldtypes",by.y="Type")
    map$rclasses <- as.character(map$rclasses)
    #get data
    res <- dbReadTable(con=con,table)



    i=1
    for(i in 1:length(map$rclasses)) {
        cvn <- call(map$rclasses[i],res[,map$Field[i]])
        res[map$Field[i]] <- eval(cvn)
    }


    return(res)
}

也许这不是一个好的编程习惯——我只是不知道有什么更好的.因此,请自担风险使用它或帮助我改进它...当然这只是其中的一半:阅读.希望我能尽快找到一些时间来编写一个写作函数.

Maybe this is not good programming practice – I just don't know any better. So, use it at your own risk or help me to improve it... And of course it's only half of it: reading. Hopefully I´ll find some time to write a writing function soon.

如果您对映射字典有任何建议,请告诉我:)

If you have suggestions for the mapping dictionary let me know :)

这篇关于在与 R 交互的数据库中处理字段类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆