括号转义表名称与dplyr [英] Bracket-escaped table names with dplyr

查看:222
本文介绍了括号转义表名称与dplyr的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我以编程方式获取一堆数据集,其中许多数据都有愚蠢的名字,以数字开头,并有特殊字符,如减号。因为没有一个数据集是特别大的,我想要利益R对数据类型做出最好的猜测,我(ab)使用dplyr将这些表转储到SQLite中。



我使用方括号来摆脱可怕的表名,但这似乎不起作用。例如:

  data(iris)
foo.db< - src_sqlite(foo.sqlite3,创建= TRUE)
copy_to(foo.db,df = iris,name =[14m3-n4m3])

这将导致错误消息:



sqliteSendQuery(conn,statement,bind.data)中的错误:语句错误:没有这样的表:14m3-n4m3



如果我选择一个明智的名称,这可以工作。不过,由于各种原因,我真的很想保留笨重的名字。我也可以直接从sqlite创建一个名字不错的表:

  sqlite>创建表[14m3-n4m3](foo,bar,baz); 
sqlite> .tables
14m3-n4m3

没有破解事情太深,这看起来像dplyr以某种方式处理方括号,我无法确定。我的怀疑是这是一个错误,但是我想首先检查这里,以确保我没有丢失任何东西。



编辑:我忘了提到我只是把janky的名字直接传给dplyr。错误如下:

 库(dplyr)

数据(虹膜)
foo.db< - src_sqlite(foo.sqlite3,create = TRUE)
copy_to(foo.db,df = iris,name =14M3-N4M3)

sqliteSendQuery(conn,statement,bind.data):
语句中的错误:无法识别的令牌:14M3


解决方案

这是dplyr中的错误。目前的github大师还在那里。正如@hadley所说,他试图逃避dplyr中的表名,以防止这个问题。你目前的问题是由于缺少两个功能的转义。提供表名称未转义(并使用 dplyr :: db_create_table 完成),表创建工作正常。但是,使用不支持奇数表名的 DBI :: dbWriteTable 来完成数据的插入。如果表名被提供给此函数转义,则无法在表的列表中找到它(您报告的第一个错误)。如果提供了转义,则执行插入的SQL不是合作有效的。



第二个问题出现在表被更新时。获取字段名称的代码,这次实际上在dplyr中,再次无法转义表名,因为它使用 paste0 而不是 build_sql



我已经修复了一个dplyr的叉子。我还向@hadley提出了一个请求,并就此问题发表了一个说明: https:/ /github.com/hadley/dplyr/issues/926 。在此期间,如果您想要使用 devtools :: install_github(NikNakk / dplyr,ref =sqlite-escape),然后还原到主版本一旦它被修复了。



顺便提一句,正确的SQL-99方法在SQL中转义表名(和其他标识符)是双引号(见 SQL标准来转义列名称)。 MS Access使用方括号,而MySQL默认为反引号。 dplyr根据标准使用双引号。



最后,@RichardScriven的提案无法普遍运行。例如, select 是R中完全有效的名称,但在SQL中不是语法有效的表名。其他保留字也同样如此。


I'm programmatically fetching a bunch of datasets, many of them having silly names that begin with numbers and have special characters like minus signs in them. Because none of the datasets are particularly large, and I wanted the benefit R making its best guess about data types, I'm (ab)using dplyr to dump these tables into SQLite.

I am using square brackets to escape the horrible table names, but this doesn't seem to work. For example:

data(iris)
foo.db <- src_sqlite("foo.sqlite3", create = TRUE)
copy_to(foo.db, df=iris, name="[14m3-n4m3]")

This results in the error message:

Error in sqliteSendQuery(conn, statement, bind.data) : error in statement: no such table: 14m3-n4m3

This works if I choose a sensible name. However, due to a variety of reasons, I'd really like to keep the cumbersome names. I am also able to create such a badly-named table directly from sqlite:

sqlite> create table [14m3-n4m3](foo,bar,baz);
sqlite> .tables
14m3-n4m3

Without cracking into things too deeply, this looks like dplyr is handling the square brackets in some way that I cannot figure out. My suspicion is that this is a bug, but I wanted to check here first to make sure I wasn't missing something.

EDIT: I forgot to mention the case where I just pass the janky name directly to dplyr. This errors out as follows:

library(dplyr)

data(iris)
foo.db <- src_sqlite("foo.sqlite3", create = TRUE)
copy_to(foo.db, df=iris, name="14M3-N4M3")

Error in sqliteSendQuery(conn, statement, bind.data) : 
  error in statement: unrecognized token: "14M3"

解决方案

This is a bug in dplyr. It's still there in the current github master. As @hadley indicates, he has tried to escape things like table names in dplyr to prevent this issue. The current problem you're having arises from lack of escaping in two functions. Table creation works fine when providing the table name unescaped (and is done with dplyr::db_create_table). However, the insertion of data to the table is done using DBI::dbWriteTable which doesn't support odd table names. If the table name is provided to this function escaped, it fails to find it in the list of tables (the first error you report). If it is provided escaped, then the SQL to do the insertion is not synatactically valid.

The second issue comes when the table is updated. The code to get the field names, this time actually in dplyr, again fails to escape the table name because it uses paste0 rather than build_sql.

I've fixed both errors at a fork of dplyr. I've also put in a pull request to @hadley and made a note on the issue https://github.com/hadley/dplyr/issues/926. In the meantime, if you wanted to you could use devtools::install_github("NikNakk/dplyr", ref = "sqlite-escape") and then revert to the master version once it's been fixed.

Incidentally, the correct SQL-99 way to escape table names (and other identifiers) in SQL is with double quotes (see SQL standard to escape column names?). MS Access uses square brackets, while MySQL defaults to backticks. dplyr uses double quotes, per the standard.

Finally, the proposal from @RichardScriven wouldn't work universally. For example, select is a perfectly valid name in R, but is not a syntactically valid table name in SQL. The same would be true for other reserved words.

这篇关于括号转义表名称与dplyr的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆