如何使用“希伯来语"阅读表格列名(在 R 中)? [英] How to read.table with "Hebrew" column names (in R)?

查看:33
本文介绍了如何使用“希伯来语"阅读表格列名(在 R 中)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试读取一个带有希伯来语列名的 .txt 文件,但没有成功.

I am trying to read a .txt file, with Hebrew column names, but without success.

我上传了一个示例文件到:http://www.talgalili.com/files/aa.txt

I uploaded an example file to: http://www.talgalili.com/files/aa.txt

我正在尝试命令:

read.table("http://www.talgalili.com/files/aa.txt", header = T, sep = "\t")

这会返回给我:

  X.....ª X...ª...... X...œ....
1      12          97         6
2     123         354        44
3       6           1         3

代替:

אחת שתיים   שלוש
12  97  6
123 354 44
6   1   3

我的输出:

l10n_info()

是:

$MBCS
[1] FALSE

$`UTF-8`
[1] FALSE

$`Latin-1`
[1] TRUE

$codepage
[1] 1252

为了:

Sys.getlocale()

是:

[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"

您能否建议我尝试和更改哪些内容以允许我正确加载文件?

Can you suggest to me what to try and change to allow me to load the file correctly ?

更新:尝试使用:

read.table("http://www.talgalili.com/files/aa.txt",fileEncoding ="iso8859-8")

导致:

 V1
1  ?
Warning messages:
1: In read.table("http://www.talgalili.com/files/aa.txt", fileEncoding = "iso8859-8") :
  invalid input found on input connection 'http://www.talgalili.com/files/aa.txt'
2: In read.table("http://www.talgalili.com/files/aa.txt", fileEncoding = "iso8859-8") :
  incomplete final line found by readTableHeader on 'http://www.talgalili.com/files/aa.txt'

同时也尝试这个:

Sys.setlocale("LC_ALL", "en_US.UTF-8")

或者这个:

Sys.setlocale("LC_ALL", "en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8")

给我这个:

[1] ""
Warning message:
In Sys.setlocale("LC_ALL", "en_US.UTF-8") :
  OS reports request to set locale to "en_US.UTF-8" cannot be honored

最后,这里是 > sessionInfo()

Finally, here is the > sessionInfo()

R version 2.10.1 (2009-12-14) 
i386-pc-mingw32 

locale:
[1] LC_COLLATE=English_United States.1255  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] tools_2.10.1

任何建议或澄清将不胜感激.

Any suggestion or clarification will be appreciated.

最好的,塔尔

推荐答案

我会尝试将参数 fileEncoding 传递给值为 iso8859-8 的 read.table.

I would try passing parameter fileEncoding to read.table with a value of iso8859-8.

使用 iconvlist() 获取支持的编码的字母顺序列表.正如我在此处所见,希伯来语必须是 ISO 的第 8 部分8859.

Use iconvlist() to get an alphabetical list of the supported encodings. As I saw here Hebrew must be part 8 of ISO 8859.

这篇关于如何使用“希伯来语"阅读表格列名(在 R 中)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆