读取UTF-8文本文件（在希伯来语中）在RStudio的控制台中显示gibrish并且在RGUI中很好 [英] Reading a UTF-8 text file (in Hebrew) shows gibrish in RStudio's console and fine in RGUI

查看：214 发布时间：2017/2/25 0:09:28 r csv utf-8 rstudio hebrew

本文介绍了读取UTF-8文本文件（在希伯来语中）在RStudio的控制台中显示gibrish并且在RGUI中很好的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在读一个csv文件到R中。当我把它打印到Rtudio中时，控制台在RStudio我得到gibrish（除非我看一个特定的向量）。而在Rgui这是很好。

我将运行的代码是：

  Sys.setlocale（LC_ALL，Hebrew）
x<  -  read.csv（https://raw.githubusercontent.com/talgalili/temp2/gh-pages/Hebrew_UTF8.txt ，encoding =UTF-8）
x＃显示gibrish 
x [，2] 
 colnames（x）
   
 这是RStudio（gibrish）的输出。
  x<  -  read.csv（https://raw.githubusercontent.com/talgalili/temp2/gh-pages/Hebrew_UTF8.txt，encoding =UTF-8）
> x 
âéì..áùðéí。 ééãã
 1 23.0æëø
 2 24.0ð÷áä
 3 23.0ð÷áä
 4 24.0ð÷áä
 5 25.0æëø
 6 18.0æëø
 7 26.0æëø
 8 21.5ð÷áä
 9 24.0æëø
 10 26.0æëø
 11 24.0æëø
 12 19.0ð÷áä
 13 19.0ð÷áä
 14 24.5æëø
 15 21.0ð÷áä
> x [，2] 
 [1]ז;הההההההההההההההההההההההה
 colnames（x）
 [1]âéì..áùðéí。 îéâãø
> 
  
在这里它是Rgui（这里很好）：
 > x<  -  read.csv（https://raw.githubusercontent.com/talgalili/temp2/gh-pages/Hebrew_UTF8.txt，encoding =UTF-8）
> x＃显示gibrish 
גיל..בשנים。 מיגדר
 1 23.0זכר
 2 24.0נקבה
 3 23.0נקבה
 4 24.0נקבה
 5 25.0זכר
 6 18.0זכר
 7 26.0זכר 
 8 21.5נקבה
 9 24.0זכר
 10 26.0זכר
 11 24.0זכר
 12 19.0נקבה
 13 19.0נקבה
 14 24.5זכר
 15 21.0נקבה
> x [，2] 
 [1]ז;הההההההההההההההההההההההה
 colnames（x）
 [1]גיל..בשנים。 מיגדר
> 
  
在这两个会话中，我的sessionInfo（）是：
 > sessionInfo（）
 R版本3.2.3（2015-12-10）
平台：x86_64-w64-mingw32 / x64（64位）
运行时：Windows 7 x64 ）Service Pack 1 
 
 locale：
 [1] LC_COLLATE = Hebrew_Israel.1255 LC_CTYPE = Hebrew_Israel.1255 
 [3] LC_MONETARY = Hebrew_Israel.1255 LC_NUMERIC = C 
 [5] LC_TIME = Hebrew_Israel.1255 
 
附加的基本包：
 [1] stats graphics grDevices datasets utils方法base 
 
其他附加包：
 [1] installr_0.17.0 
  
我使用的是最新的RStudio版本0.99.892 
 
 
 感谢。
解决方案
这是R-studio中的错误，一。我看过您已经收到了关于R-studio目前在Windows上支持非英语语言环境的问题的一般回答。据我所知，这不是第一次/版本有类似的问题。您还可能遇到一些我认为与win 10相关的新问题 。注意，因为我也有第二种类型的问题，我使用英语区域设置打印希伯来语。
 
 
 所以我试过一些调试你的问题，有一些解决方法，一些新的见解（我认为..）在哪里的问题。我认为它可以进一步调试写一个完整的函数，将修复它，但由于时间（和小时）限制我决定停止这里。
 
 
 我've created this data：
  x < -  data.frame（x= c（דור，dor ））
  
如前所述，使用希伯来语区域设置I以及获取
  Sys.setlocale（LC_ALL，Hebrew）
 [1]LC_COLLATE = Hebrew_Israel.1255; LC_CTYPE = Hebrew_Israel.1255; LC_MONETARY = Hebrew_Israel.1255; LC_NUMERIC = C; LC_TIME = Hebrew_Israel.1255
 
דור
 [1]ãåø
 
x 
x 
 1ãåø
 2 dor 
  
使用英语区域设置， 。
  Sys.setlocale（LC_ALL，English）
 [1]LC_COLLATE = English_United States。 1252; LC_CTYPE = English_United States.1252; LC_MONETARY = English_United States.1252; LC_NUMERIC = C; LC_TIME = English_United States.1252
 
דור
 [1]דור
 
x 
x 
 1< U + 05D3>< U + 05D5>< U + 05E8& 
 2 dor 
  
注意， data.frame 输出打印精细。也可以使用 data.table 类进行打印，并使用 list 和 code>。
 
 
 检查 print.data.frame 和表方法揭示主要嫌疑人： format 。 
 
 
 进一步调查证实这些怀疑：
  as.matrix ）
x 
 [1，]
 [2，]dor
 
格式（as.matrix（x））
x 
 [1，]< U + 05D3>< U + 05D5>< U + 05E8& 
 [2，]dor
  
 ：
  Sys.setlocale（LC_ALL，Hebrew）
x<  -  read.csv（https ：//raw.githubusercontent.com/talgalili/temp2/gh-pages/Hebrew_UTF8.txt，encoding =UTF-8）
 as.matrix（x）
âéã..áùðéí。 îéâ€
 [1，]23.0זכר
 [2，]24.0נקבה
 [3，]23.0נקבה
 [ ]24.0נקבה
 [5，]25.0זכר
 [6，]18.0זכר
 [7，]26.0 b $ b [8，]21.5נקבה
 [9，]24.0זכר
 [10，]26.0זכר
 [11， 24.0זכר
 [12，]19.0נקבה
 [13，]19.0נקבה
 [14，]24.5זכר
 [15，]21.0נקבה
  
两个地区：希伯来语和英语在我的机器上工作，但 col.names 对两者都不起作用。 
 
 
 总而言之，这不是一个完整的解决方案，而是一个小的和部分的工作，打印（或记得格式化）问题。它还在R-studio中对这个希伯来语/非英语问题有了更多的了解，可以在其中写出一些更好的解决方案。在Windows中编写希伯来语的类似问题的解决方案的一个例子可以看到在此SO线程。
 
I am trying to understand if this is a bug in RStudio or am I missing something.

I am reading a csv file into R. When printing it into the console in RStudio I get gibrish (unless I look at a specific vector). While in Rgui this is fine.

The code I will run is this:
Sys.setlocale("LC_ALL", "Hebrew")
x <- read.csv("https://raw.githubusercontent.com/talgalili/temp2/gh-pages/Hebrew_UTF8.txt", encoding="UTF-8")  
x # shows gibrish
x[,2]
colnames(x)
Here is the output from RStudio (gibrish)
> x <- read.csv("https://raw.githubusercontent.com/talgalili/temp2/gh-pages/Hebrew_UTF8.txt", encoding="UTF-8")
> x
   âéì..áùðéí. îéâãø
1         23.0   æëø
2         24.0  ð÷áä
3         23.0  ð÷áä
4         24.0  ð÷áä
5         25.0   æëø
6         18.0   æëø
7         26.0   æëø
8         21.5  ð÷áä
9         24.0   æëø
10        26.0   æëø
11        24.0   æëø
12        19.0  ð÷áä
13        19.0  ð÷áä
14        24.5   æëø
15        21.0  ð÷áä
> x[,2]
 [1] זכר  נקבה נקבה נקבה זכר  זכר  זכר  נקבה זכר  זכר  זכר  נקבה נקבה זכר  נקבה
Levels: זכר נקבה
> colnames(x)
[1] "âéì..áùðéí." "îéâãø"      
> 
And here it is in Rgui (here it is fine):
>     x <- read.csv("https://raw.githubusercontent.com/talgalili/temp2/gh-pages/Hebrew_UTF8.txt", encoding="UTF-8")  
>     x # shows gibrish
   גיל..בשנים. מיגדר
1         23.0   זכר
2         24.0  נקבה
3         23.0  נקבה
4         24.0  נקבה
5         25.0   זכר
6         18.0   זכר
7         26.0   זכר
8         21.5  נקבה
9         24.0   זכר
10        26.0   זכר
11        24.0   זכר
12        19.0  נקבה
13        19.0  נקבה
14        24.5   זכר
15        21.0  נקבה
>     x[,2]
 [1] זכר  נקבה נקבה נקבה זכר  זכר  זכר  נקבה זכר  זכר  זכר  נקבה נקבה זכר  נקבה
Levels: זכר נקבה
>     colnames(x)
[1] "גיל..בשנים." "מיגדר"      
> 
In both sessions, my sessionInfo() is:
> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=Hebrew_Israel.1255  LC_CTYPE=Hebrew_Israel.1255   
[3] LC_MONETARY=Hebrew_Israel.1255 LC_NUMERIC=C                  
[5] LC_TIME=Hebrew_Israel.1255    

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] installr_0.17.0
I'm using the latest RStudio version 0.99.892

Thanks.
 解决方案 
This is a bug in R-studio and not the only one. I've seen you have received a general answer about problems R-studio currently having with non-English locale support on windows. As far as I know it is not the first time / version having similar problems. You may also meet some new problems that I think related to win 10 . Note that since I'm having the second type of problems as well, I am using English locale to print Hebrew.

So I have tried some debugging on your problem there and came with some work-around, and some new insights (I think..) on where is the problem. I think it can be further debugged to write a complete function that will fix it, but due to time  (and hour) restrictions I've decide to stop here.

I've created this data:
x <- data.frame("x"= c("דור","dor"))
As mentioned already, using Hebrew locale I as well get gibrish
Sys.setlocale("LC_ALL", "Hebrew")
[1] "LC_COLLATE=Hebrew_Israel.1255;LC_CTYPE=Hebrew_Israel.1255;LC_MONETARY=Hebrew_Israel.1255;LC_NUMERIC=C;LC_TIME=Hebrew_Israel.1255"

"דור"
[1] "ãåø"

x
   x
1 ãåø
2 dor
Using English locale, I've get this output.
Sys.setlocale("LC_ALL", "English")
[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"

 "דור"
[1] "דור"

x
                         x
1 <U+05D3><U+05D5><U+05E8>
2                      dor
Note that non data.frame output prints fine. It also occurs with data.table class, and prints fine with list and matrix.

Checking both print.data.frame and print.table methods reveals the main suspect: format. 

Further investigation confirm these suspicions:
as.matrix(x)
     x    
[1,] "דור"
[2,] "dor"

format(as.matrix(x))
     x                         
[1,] "<U+05D3><U+05D5><U+05E8>"
[2,] "dor                     "
As such in your case I suggest following this workflow:
Sys.setlocale("LC_ALL", "Hebrew")
x <- read.csv("https://raw.githubusercontent.com/talgalili/temp2/gh-pages/Hebrew_UTF8.txt", encoding="UTF-8")  
as.matrix(x) 
      âéì..áùðéí. îéâãø 
 [1,] "23.0"      "זכר" 
 [2,] "24.0"      "נקבה"
 [3,] "23.0"      "נקבה"
 [4,] "24.0"      "נקבה"
 [5,] "25.0"      "זכר" 
 [6,] "18.0"      "זכר" 
 [7,] "26.0"      "זכר" 
 [8,] "21.5"      "נקבה"
 [9,] "24.0"      "זכר" 
[10,] "26.0"      "זכר" 
[11,] "24.0"      "זכר" 
[12,] "19.0"      "נקבה"
[13,] "19.0"      "נקבה"
[14,] "24.5"      "זכר" 
[15,] "21.0"      "נקבה"
Both locales: Hebrew and English worked on my machine, but col.names didn't work for neither. 

To conclude, this is far from being a complete solution, but just a small and partial work-around the printing (or shall recall the formatting) problem. It also shed some more light on this Hebrew / non-English issue in R-studio, on which some better solutions may be written. One example for a solution for a similar problem of writing Hebrew in windows can be seen on this SO thread.

                        这篇关于读取UTF-8文本文件（在希伯来语中）在RStudio的控制台中显示gibrish并且在RGUI中很好的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

读取UTF-8文本文件（在希伯来语中）在RStudio的控制台中显示gibrish并且在RGUI中很好 [英] Reading a UTF-8 text file (in Hebrew) shows gibrish in RStudio's console and fine in RGUI

问题描述

相关文章

Office最新文章

热门教程

热门工具

登录关闭

读取UTF-8文本文件（在希伯来语中）在RStudio的控制台中显示gibrish并且在RGUI中很好 [英] Reading a UTF-8 text file (in Hebrew) shows gibrish in RStudio&#39;s console and fine in RGUI

问题描述

相关文章

Office最新文章

热门教程

热门工具

登录 关闭

读取UTF-8文本文件（在希伯来语中）在RStudio的控制台中显示gibrish并且在RGUI中很好 [英] Reading a UTF-8 text file (in Hebrew) shows gibrish in RStudio's console and fine in RGUI

登录关闭