RSQLite 类型转换问题 [英] RSQLite typecasting issue

查看:15
本文介绍了RSQLite 类型转换问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将 SQLite 数据库中的表写入 R 数据框中,但遇到了一个让我感到困惑的问题.这是我要导入的 SQLite 表中的前三个条目:

I'm trying to write a table from a SQLite database into an R data frame and have hit upon a problem that has me stumped. Here are the three first entries in the SQLite table I would like to import:

1|10|0|0|0|0|10|10|0|0|0|6|8|6|20000|30000|2012-02-29 21:27:07.239091|2012-02-29 21:28:24.815385|6|80.67.28.161|||||||||||||||||||||||||||||||33|13.4936||t|t|f||||||||||||||||||4|0|0|7|7|2
2|10|0|0|0|0|0|0|0|2|2|4|5|4|20000|30000|2012-02-29 22:00:30.618726|2012-02-29 22:04:09.629942|5|80.67.28.161|3|7||0|1|3|0|||4|3|4|5|5|5|5|4|5|4|4|0|0|0|0|0|9|9|9|9|9|||1|f|t|f|||||||||||||k|text|l|||-13|0|3|10||2
3|13|2|4|4|4|4|1|1|2|5|6|3|2|40000|10000|2012-03-01 09:07:52.310033|2012-03-01 09:21:13.097303|6|80.67.28.161|2|2||30|1|1|0|||4|2|1|6|8|3|5|6|6|7|6|||||||||||26|13.6336|4|f|t|f|t|f|f|f|f|||||||||some text||||10|1|1|3|2|3

我感兴趣的是第 53 到 60 列,为了省去您在上面计算的麻烦,如下所示:

What I'm interested in are columns 53 through 60, which, to save you the trouble of counting in the above, look like this:

|t|t|f||||||
|f|t|f||||||
|f|t|f|t|f|f|f|f|

如您所见,对于前两个条目,只有前三列不是 NULL,而对于第三个条目,所有八列都分配了值.

As you can see for the first two entries only the first three of those columns are not NULL while for the third entry all eight columns have values assigned to them.

这是这些列的 SQLite 表信息

Here's the SQLite table info for those columns

sqlite> PRAGMA table_info(observations);
0|id|INTEGER|1||1
** snip **
53|understanding1|boolean|0||0
54|understanding2|boolean|0||0
55|understanding3|boolean|0||0
56|understanding4|boolean|0||0
57|understanding5|boolean|0||0
58|understanding6|boolean|0||0
59|understanding7|boolean|0||0
60|understanding8|boolean|0||0
** snip **

现在,当我尝试将其读入 R 时,这些相同的列最终会变成这样:

Now, when I try to read this into R here's what those same columns end up becoming:

> library('RSQLite')
> con <- dbConnect("SQLite", dbname = 'db.sqlite3))
> obs <- dbReadTable(con,'observations')
> obs[1:3,names(obs) %in% paste0('understanding',1:8)]
  understanding1 understanding2 understanding3 understanding4 understanding5 understanding6 understanding7
1              t              t              f             NA             NA             NA             NA
2              f              t              f             NA             NA             NA             NA
3              f              t              f              0              0              0              0
  understanding8
1             NA
2             NA
3              0

如您所见,前三列包含的值为 't''f' 其他列是 NA> 其中 SQLite 表中的相应值是 NULL 并且 0 不是 - 无论 SQLite 表中的相应值是否为 tf.不用说,这不是我所期望的行为.我认为问题是这些列的类型转换不正确:

As you can see, while the first three columns contain values that are either 't' or 'f' the other columns are NA where the corresponding values in the SQLite table are NULL and 0 where they are not - irrespective of whether the corresponding values in the SQLite table are t or f. Needless to say this is not the behavior I expected. The problem is, I think, that these columns are typecasted incorrectly:

> sapply(obs[1:3,names(obs) %in% paste0('understanding',1:8)], class)
understanding1 understanding2 understanding3 understanding4 understanding5 understanding6 understanding7 
   "character"    "character"    "character"      "numeric"      "numeric"      "numeric"      "numeric" 
understanding8 
     "numeric" 

会不会是RSQLite看到tf为对应列中的值后,将前三列设置为character类型第一个条目但带有 numeric 因为在这些列中第一个条目恰好是 NULL?

Could it be that RSQLite sets the first three columns to the character type upon seeing t and f as values in the corresponding columns in the first entry but goes with numeric because in these columns the first entry just happens to be NULL?

如果确实发生了这种情况,有没有办法解决这个问题并将所有这些列转换为 character(或者更好的是 logical)?

If this is indeed what's happening is there any way of working around this and casting all these columns into character (or, even better, logical)?

推荐答案

以下内容有点笨拙,但有效:

The following is hacky, but it works:

# first make a copy of the DB and work with it instead of changing
# data in the original
original_file <- "db.sqlite3"
copy_file <- "db_copy.sqlite3"
file.copy(original_file, copy_file) # duplicate the file
con <- dbConnect("SQLite", dbname = copy_file) # establish a connection to the copied DB

# put together a query to replace all NULLs by 'NA' and run it
columns <- c(paste0('understanding',1:15))
columns_query <- paste(paste0(columns,' = IfNull(',columns,",'NA')"),collapse=",")
query <- paste0("UPDATE observations SET ",columns_query)
dbSendQuery(con, query)

# Now that all columns have string values RSQLite will infer the 
# column type to be `character`
df <- dbReadTable(con,'observations') # read the table
file.remove(copy_file) # delete the copy

# replace all 'NA' strings with proper NAs
df[names(df) %in% paste0('understanding',1:15)][df[names(df) %in% paste0('understanding',1:15)] == 'NA'] <- NA
# convert 't' to boolean TRUE and 'f' to boolean FALSE
df[ ,names(df) %in% paste0('understanding',1:15)] <- sapply( df[ ,names(df) %in% paste0('understanding',1:15)], function(x) {x=="t"} )

这篇关于RSQLite 类型转换问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆