从R中的MySQL中获取UTF-8文本返回“????” [英] Fetching UTF-8 text from MySQL in R returns "????"
问题描述
我试着从R中的MySQL数据库中获取UTF-8文本。我在OS X上运行R(通过GUI和命令行尝试),其中默认语言环境是en_US.UTF-8 ,无论我尝试什么,查询结果显示?
我已尝试设置选项(encoding ='UTF-8')
, DBMSencoding ='UTF-8'
时通过ODBC连接,设置编码(res $ str)< - UTF- c $ c>获取结果,以及每个的utf8变体,所有没有效果。从命令行运行查询mysql客户端显示正确的结果。
我完全被骗了。任何想法为什么它不工作,或其他事情,我应该尝试?
这是一个相当小的测试用例:
$ mysql -u root
mysql> CREATE DATABASE测试;
mysql>使用试验;
mysql> CREATE TABLE test(str VARCHAR(10))ENGINE = InnoDB DEFAULT CHARSET = utf8;
查询OK,0行受影响(0.02秒)
mysql> INSERT INTO test(str)VALUES('こんにちは');
查询OK,1行受影响(0.00秒)
mysql>选择*从测试;
+ ----------------- +
| str |
+ ----------------- +
|こんにちは|
+ ----------------- +
集合中的1行(0.00秒)
使用RODBC和RMySQL查询R中的表会显示?????对于str列:
> con< -odbcDriverConnect('DRIVER = mysql; user = root',DBMSencoding ='UTF-8')
> sqlQuery(con,'SELECT * FROM rtest.test')
str
1
> library(RMySQL)
正在加载所需软件包:DBI
> con< - dbConnect(MySQL(),user ='root')
> dbGetQuery(con,'SELECT * FROM rtest.test')
str
1
为了完整性,这里是我的sessionInfo:
> sessionInfo()
R版本2.15.1(2012-06-22)
平台:x86_64-apple-darwin9.8.0 / x86_64(64位)
locale:
[1] en_US.UTF-8 / en_US.UTF-8 / en_US.UTF-8 / C / en_US.UTF-8 / en_US.UTF-8
附带的基本包:
[1] stats graphics grDevices utils数据集方法base
其他附加包:
[1] RMySQL_0.9-3 DBI_0.2-5 RODBC_1.3-6
感谢@chooban我发现连接会话使用latin1的utf8。下面是我发现的两个解决方案:
- 对于RMySQL,连接后运行查询
SET NAMES utf8
更改连接字符集。 - 对于RODBC,使用DSN字符串中的
CharSet = utf8
连接。我无法通过ODBC运行SET NAMES
。
这个问题指向我的方向正确。 p>
I'm stuck trying to fetch UTF-8 text in a MySQL database from R. I'm running R on OS X (tried both via the GUI and command line), where the default locale is en_US.UTF-8, and no matter what I try, the query result shows "?" for all non-ASCII characters.
I've tried setting options(encoding='UTF-8')
, DBMSencoding='UTF-8'
when connecting via ODBC, setting Encoding(res$str) <- 'UTF-8'
after fetching the results, as well as 'utf8' variants of each of those, all to no avail. Running the query from the command line mysql client shows the results correctly.
I'm totally stumped. Any ideas why it's not working, or other things I should try?
Here's a fairly minimal test case:
$ mysql -u root
mysql> CREATE DATABASE test;
mysql> USE test;
mysql> CREATE TABLE test (str VARCHAR(10)) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.02 sec)
mysql> INSERT INTO test (str) VALUES ('こんにちは');
Query OK, 1 row affected (0.00 sec)
mysql> select * from test;
+-----------------+
| str |
+-----------------+
| こんにちは |
+-----------------+
1 row in set (0.00 sec)
Querying the table in R using both RODBC and RMySQL shows "?????" for the str column:
> con <- odbcDriverConnect('DRIVER=mysql;user=root', DBMSencoding='UTF-8')
> sqlQuery(con, 'SELECT * FROM rtest.test')
str
1 ?????
> library(RMySQL)
Loading required package: DBI
> con <- dbConnect(MySQL(), user='root')
> dbGetQuery(con, 'SELECT * FROM rtest.test')
str
1 ?????
For completeness, here's my sessionInfo:
> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RMySQL_0.9-3 DBI_0.2-5 RODBC_1.3-6
Thanks to @chooban I found out the connection session was using latin1 instead of utf8. Here are two solutions I found:
- For RMySQL, after connecting run the query
SET NAMES utf8
to change the connection character set. - For RODBC, connect using
CharSet=utf8
in the DSN string. I was not able to runSET NAMES
via ODBC.
This question pointed me in the right direction.
这篇关于从R中的MySQL中获取UTF-8文本返回“????”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!