在与 R 交互的数据库中处理字段类型 [英] Handling field types in database interaction with R
问题描述
我使用 RMySQL 和 MySQL 数据库来存储我的数据集.有时数据会被修改,或者我也将结果存储回数据库.长话短说,在我的用例中,R 和数据库之间有相当多的交互.
I use RMySQL and a MySQL database to store my datasets. Sometimes data gets revised or I store results back to the database as well. Long story short, there is quite some interaction between R and the database in my use case.
大多数时候我使用方便的函数,如 dbWriteTable
和 dbReadTable
来写入和读取我的数据.不幸的是,这些完全忽略了 R 数据类型和 MySQL 字段类型.我的意思是我希望 MySQL 日期字段以 Date
或 POSIX
类结束.相反,我认为这些 R 类存储为某种对应的 MySQL 字段类型.这意味着日期不应该是字符 - 我不希望在这里区分浮动和双打...
Most of the time I use convenience functions like dbWriteTable
and dbReadTable
to write and read my data. Unfortunately these are just completely ignoring R data types and the MySQL field types. I mean I would expect that MySQL date fields end up in a Date
or POSIX
class. The other way around I´d think that these R classes are stored as a somewhat corresponding MySQL field type. That means a date should not be character – I do not expect to distinguish between float and doubles here...
我也尝试使用 dbGetQuery
- 结果相同.在阅读手册时,我是否完全错过了某些内容,或者在这些软件包中(还)根本不可能?一个好的解决方法会怎样?
I also tried to use dbGetQuery
– same result there. Is there something I have completely missed when reading the manual or is it simply not possible (yet) in these packages? What would by a nice work around?
@mdsummer我试图在文档中找到更多内容,但只找到了以下令人失望的几行:`MySQL 表作为 data.frames 读入 R,但没有将字符或逻辑数据强制转换为因子.类似地,在导出 data.frames 时,因子被导出为字符向量.
@mdsummer I tried to find something more in the documentation, but found only these disappointing lines: `MySQL tables are read into R as data.frames, but without coercing character or logical data into factors. Similarly while exporting data.frames, factors are exported as character vectors.
整数列通常作为 R 整数向量导入,但 BIGINT 或 UNSIGNED INTEGER 等情况被强制转换为 R 的双精度向量以避免截断(目前 R 的整数是有符号的 32 位量).
Integer columns are usually imported as R integer vectors, except for cases such as BIGINT or UNSIGNED INTEGER which are coerced to R's double precision vectors to avoid truncation (currently R's integers are signed 32-bit quantities).
时间变量作为字符数据导入/导出,因此您需要将它们转换为您喜欢的日期/时间表示.
Time variables are imported/exported as character data, so you need to convert these to your favorite date/time representation.
推荐答案
好的,我现在有了一个可行的解决方案.这是一个将 MySQL 字段类型映射到 R 类的函数.这特别有助于处理 MySQL 字段类型日期...
Ok, I got a working solution now. Here's a function that maps MySQL field types to R classes. This helps in particular handling the MySQL field type date...
dbReadMap <- function(con,table){
statement <- paste("DESCRIBE ",table,sep="")
desc <- dbGetQuery(con=con,statement)[,1:2]
# strip row_names if exists because it's an attribute and not real column
# otherweise it causes problems with the row count if the table has a row_names col
if(length(grep(pattern="row_names",x=desc)) != 0){
x <- grep(pattern="row_names",x=desc)
desc <- desc[-x,]
}
# replace length output in brackets that is returned by describe
desc[,2] <- gsub("[^a-z]","",desc[,2])
# building a dictionary
fieldtypes <- c("int","tinyint","bigint","float","double","date","character","varchar","text")
rclasses <- c("as.numeric","as.numeric","as.numeric","as.numeric","as.numeric","as.Date","as.character","as.character","as.character")
fieldtype_to_rclass = cbind(fieldtypes,rclasses)
map <- merge(fieldtype_to_rclass,desc,by.x="fieldtypes",by.y="Type")
map$rclasses <- as.character(map$rclasses)
#get data
res <- dbReadTable(con=con,table)
i=1
for(i in 1:length(map$rclasses)) {
cvn <- call(map$rclasses[i],res[,map$Field[i]])
res[map$Field[i]] <- eval(cvn)
}
return(res)
}
也许这不是一个好的编程习惯——我只是不知道有什么更好的.因此,请自担风险使用它或帮助我改进它...当然这只是其中的一半:阅读
.希望我能尽快找到一些时间来编写一个写作函数.
Maybe this is not good programming practice – I just don't know any better. So, use it at your own risk or help me to improve it... And of course it's only half of it: reading
. Hopefully I´ll find some time to write a writing function soon.
如果您对映射字典有任何建议,请告诉我:)
If you have suggestions for the mapping dictionary let me know :)
这篇关于在与 R 交互的数据库中处理字段类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!