从R中的MySQL中获取UTF-8文本返回“????” [英] Fetching UTF-8 text from MySQL in R returns "????"

查看:156
本文介绍了从R中的MySQL中获取UTF-8文本返回“????”的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试着从R中的MySQL数据库中获取UTF-8文本。我在OS X上运行R(通过GUI和命令行尝试),其中默认语言环境是en_US.UTF-8 ,无论我尝试什么,查询结果显示?



我已尝试设置选项(encoding ='UTF-8') DBMSencoding ='UTF-8'时通过ODBC连接,设置编码(res $ str)< - UTF- c $ c>获取结果,以及每个的utf8变体,所有没有效果。从命令行运行查询mysql客户端显示正确的结果。



我完全被骗了。任何想法为什么它不工作,或其他事情,我应该尝试?



这是一个相当小的测试用例:

  $ mysql -u root 
mysql> CREATE DATABASE测试;
mysql>使用试验;
mysql> CREATE TABLE test(str VARCHAR(10))ENGINE = InnoDB DEFAULT CHARSET = utf8;
查询OK,0行受影响(0.02秒)

mysql> INSERT INTO test(str)VALUES('こんにちは');
查询OK,1行受影响(0.00秒)

mysql>选择*从测试;
+ ----------------- +
| str |
+ ----------------- +
|こんにちは|
+ ----------------- +
集合中的1行(0.00秒)

使用RODBC和RMySQL查询R中的表会显示?????对于str列:

 > con< -odbcDriverConnect('DRIVER = mysql; user = root',DBMSencoding ='UTF-8')
> sqlQuery(con,'SELECT * FROM rtest.test')
str
1
> library(RMySQL)
正在加载所需软件包:DBI
> con< - dbConnect(MySQL(),user ='root')
> dbGetQuery(con,'SELECT * FROM rtest.test')
str
1

为了完整性,这里是我的sessionInfo:

 > sessionInfo()
R版本2.15.1(2012-06-22)
平台:x86_64-apple-darwin9.8.0 / x86_64(64位)

locale:
[1] en_US.UTF-8 / en_US.UTF-8 / en_US.UTF-8 / C / en_US.UTF-8 / en_US.UTF-8

附带的基本包:
[1] stats graphics grDevices utils数据集方法base

其他附加包:
[1] RMySQL_0.9-3 DBI_0.2-5 RODBC_1.3-6


解决方案

感谢@chooban我发现连接会话使用latin1的utf8。下面是我发现的两个解决方案:




  • 对于RMySQL,连接后运行查询 SET NAMES utf8 更改连接字符集。

  • 对于RODBC,使用DSN字符串中的 CharSet = utf8 连接。我无法通过ODBC运行 SET NAMES



这个问题指向我的方向正确。 p>

I'm stuck trying to fetch UTF-8 text in a MySQL database from R. I'm running R on OS X (tried both via the GUI and command line), where the default locale is en_US.UTF-8, and no matter what I try, the query result shows "?" for all non-ASCII characters.

I've tried setting options(encoding='UTF-8'), DBMSencoding='UTF-8' when connecting via ODBC, setting Encoding(res$str) <- 'UTF-8' after fetching the results, as well as 'utf8' variants of each of those, all to no avail. Running the query from the command line mysql client shows the results correctly.

I'm totally stumped. Any ideas why it's not working, or other things I should try?

Here's a fairly minimal test case:

$ mysql -u root
mysql> CREATE DATABASE test;
mysql> USE test;
mysql> CREATE TABLE test (str VARCHAR(10)) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.02 sec)

mysql> INSERT INTO test (str) VALUES ('こんにちは');
Query OK, 1 row affected (0.00 sec)

mysql> select * from test;
+-----------------+
| str             |
+-----------------+
| こんにちは      |
+-----------------+
1 row in set (0.00 sec)

Querying the table in R using both RODBC and RMySQL shows "?????" for the str column:

> con <- odbcDriverConnect('DRIVER=mysql;user=root', DBMSencoding='UTF-8')
> sqlQuery(con, 'SELECT * FROM rtest.test')
    str
1 ?????
> library(RMySQL)
Loading required package: DBI
> con <- dbConnect(MySQL(), user='root')
> dbGetQuery(con, 'SELECT * FROM rtest.test')
    str
1 ?????

For completeness, here's my sessionInfo:

> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RMySQL_0.9-3 DBI_0.2-5    RODBC_1.3-6 

解决方案

Thanks to @chooban I found out the connection session was using latin1 instead of utf8. Here are two solutions I found:

  • For RMySQL, after connecting run the query SET NAMES utf8 to change the connection character set.
  • For RODBC, connect using CharSet=utf8 in the DSN string. I was not able to run SET NAMES via ODBC.

This question pointed me in the right direction.

这篇关于从R中的MySQL中获取UTF-8文本返回“????”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆