R MySQL查询使日语字符变形 [英] R RMySQL query deforms japanese characters
问题描述
我正在使用RMySQL连接到AWS MySQL服务器。它起作用,除了字符值变形。之前曾有人问过这个问题,但修复程序似乎对我不起作用。这就是我正在做的事情:
I am using RMySQL to connect to an aws MySQL server. It works, except character values are deformed. This question has been asked before but the fixes don't seem to work for me. Here's what I'm doing:
确保没有打开连接:
dbListConnections(MySQL())
list()
dbListConnections(MySQL()) list()
确保我的连接设置为使用UTF-8:
Make sure my connection is set to use UTF-8:
dbGetQuery(凭证,显示变量,例如'character_set%')
dbGetQuery(credentials, "show variables like 'character_set%'")
Variable_name Value
1 character_set_client utf8
2 character_set_connection utf8
3 character_set_database utf8
4 character_set_filesystem utf8
5 character_set_results utf8
6 character_set_server utf8
7 character_set_system utf8
8 character_sets_dir /rdsdbbin/mysql-5.5.40.R1/share/charsets/
获取数据:
数据<-dbGetQuery(凭证,查询)
头(数据)
keyword_ja
1 \ 036
2 \036蜀ャ
3 \036螟\x8f
4 \036譌・譛ャ莠コ
5 \037繧,繝ゥ繧ケ繝\x88
6 \037连续守ゥ ォ
data <- dbGetQuery(credentials, Query) head(data) keyword_ja 1 \036 2 \036蜀ャ 3 \036螟\x8f 4 \036譌・譛ャ莠コ 5 \037繧、繝ゥ繧ケ繝\x88 6 \037蜿守ゥォ
当我将数据写入磁盘时,Excel显示相同变形的字符,但notepad ++可以以某种方式显示日语:
When I write this data to disk Excel shows the same deformed characters, but notepad++ can somehow show the japanese as it's intended:
keyword_ja
"keyword_ja"
冬
夏
日本人
イラスト
收获
"冬" "夏" "日本人" "イラスト" "収穫"
我一直在努力使用R中的Encoding()和enc2utf8()之类的函数来使其像notepad ++一样正确显示字符,但没有成功。
I've been trying to use functions like Encoding() and enc2utf8() in R to get it to display the characters correctly as notepad++ does, with no success.
编码(head(data $ keyword_ja))
Encoding(head(data$keyword_ja))
[1]未知未知未知未知未知未知
[1] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown"
enc2utf8(head(data $ keyword_ja))
enc2utf8(head(data$keyword_ja))
[1] \036 \036蜀ャ \036螟< 8f > \036譌・譛ャ莠コ \037繧,繝ゥ繧ケ繝< 88> \037蜿蜒ゥ ォ
[1] "\036" "\036蜀ャ" "\036螟<8f>" "\036譌・譛ャ莠コ" "\037繧、繝ゥ繧ケ繝<88>" "\037蜿守ゥォ"
我通常可以键入日语字符,R可以毫无问题地显示它们
I can normally type japanese characters and R has no problem displaying them
Sys.getlocale()
[1] LC_COLLATE = Japanese_Japan.932; LC_CTYPE = Japanese_Japan.932; LC_MONETARY = Japanese_Japan.932; LC_NUMERIC = C; LC_TIME = Japanese_Japan.932
mystring<-日本语入力できる
mystring
[1]日本语入力できる
编码(mystring)
[1]未知
Sys.getlocale() [1] "LC_COLLATE=Japanese_Japan.932;LC_CTYPE=Japanese_Japan.932;LC_MONETARY=Japanese_Japan.932;LC_NUMERIC=C;LC_TIME=Japanese_Japan.932" mystring <- "日本語入力できる" mystring [1] "日本語入力できる" Encoding(mystring) [1] "unknown"
我非常想知道这一点,因此非常感谢您的帮助。请让我知道是否可以提供其他信息。
I'm pretty desperate to figure this out so any help is very much appreciated. Please let me know if I can provide additional information.
推荐答案
基于这篇SO文章,您可能必须使用UTF 将数据写入磁盘-8编码。尝试以下操作:
Based on this SO article, you might have to write your data to disk with UTF-8 encoding. Try this:
data <- dbGetQuery(credentials, Query)
con <- file('output.csv', encoding="utf8")
write.csv(data, file=con)
然后尝试在Excel和Notepad ++中打开 output.csv
并让我们知道结果。当您将该文件读回R时,它有望表现出预期的效果:
Then try opening output.csv
in both Excel and Notepad++ and let us know the results. When you read this file back into R, it should hopefully behave as expected:
fread("test.csv", encoding="UTF-8")
这篇关于R MySQL查询使日语字符变形的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!