无法将unicode .csv读取到R中 [英] Cannot read unicode .csv into R

查看:173
本文介绍了无法将unicode .csv读取到R中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个.csv文件,其中包含以下数据:

 rans,Բ
1,10
2,20

我无法读取R

  d<  -  read.csv(./ Data / 1.csv ,fileEncoding =UTF-8)
head(d)

生成以下:

 > d < -  read.csv(./ Data / 1.csv,fileEncoding =UTF-8)
警告消息:
1:在read.table(file = file,header = header,sep = sep,quote = quote,:
在输入连接'./Data/1.csv'上找到无效的输入
2:在read.table(file = file,header = header,sep = sep,quote = quote,:
readTableHeader在'./Data/1.csv'上找到不完整的最后一行
> head(d)
[1] X.
< 0 rows>(或0长度的行。名称)

指定fileEncoding将产生以下结果:

 > d<  -  read.csv(./ Data / 1.csv) 
>头(d)
Ô。Ô²
1 1 10
2 2 20

当我运行文件实用程序找到文件的编码,它说它是UTF-8:

  Data\1.csv:UTF-8 Unicode文本,带有CRLF行终止符

我使用RStudio,Windows 7,R版本2.15.2,32位。



提前感谢。

解决方案

我在同一个问题上写了更长的答案: Windows上的R:字符编码地狱



快速回答,使用参数encoding而不是fileEncoding修复您的第一个问题。您将无法在RStudio的控制台或表格视图中阅读它,但您将能够在公式中使用它。

  d<  -  read.csv(./ Data / 1.csv,encoding =UTF-8)
head(d)
pre>

已将表格保存为UTF-8文件:

 > test2 < -  read.csv(test2.csv,header = FALSE,sep =,,quote =\,dec =。,fill = TRUE,comment.char = =UTF-8)
警告消息:
在read.table(file = file,header = header,sep = sep,quote = quote,:
readTableHeader on'test2.csv'

这给出了它在控制台和RStudio视图中的外观。 p>

 > test2 
V1 V2
1< U + 0531>< U + 0532>
2 1 10
3 2 20

不过很重要的是你可以因此在我的情况下,可以看到脚本窗口输入ウ具有UTF-8编码,并且grep正确地在您的表中找到此编码。

 >编码(Ա)
[1]UTF-8
> grep(Ա,as.character(test2 [1,1] ))
[1] 1

您可能需要找到适合的编码变体您的设置,或可能更改它们。不幸的是,我不知道在哪里做。



您可能无法在所有阶段都使用它,但它绝对可以让它在Windows 7环境中工作。


I have a .csv file, which contains the following data:

"Ա","Բ"
1,10
2,20

I cannot read it into R so that the column names are displayed like they are in the file.

d <- read.csv("./Data/1.csv", fileEncoding="UTF-8")
head(d)

Produces the following:

> d <- read.csv("./Data/1.csv", fileEncoding="UTF-8")
Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  invalid input found on input connection './Data/1.csv'
2: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  incomplete final line found by readTableHeader on './Data/1.csv'
> head(d)
[1] X.
<0 rows> (or 0-length row.names)

Meanwhile, doing the same without specifying the fileEncoding produces this:

> d <- read.csv("./Data/1.csv")
> head(d)
  Ô. Ô²
1  1 10
2  2 20

When I run the "file" utility to find out the encoding of the file, it says it is UTF-8:

Data\1.csv: UTF-8 Unicode text, with CRLF line terminators

I am using RStudio, Windows 7, R version 2.15.2, 32-bit.

Thanks in advance.

解决方案

I wrote a longer answer on the same issue here: R on Windows: character encoding hell .

Quick answer, using the parameter encoding instead of fileEncoding should fix your first issue. You will not be able to read it possibly in either console or table view in RStudio, but you will be able to use it in formulaes.

d <- read.csv("./Data/1.csv", encoding="UTF-8")
head(d)

Having saved your table into a UTF-8 file:

> test2 <- read.csv("test2.csv", header = FALSE, sep = ",", quote = "\"", dec = ".", fill = TRUE, comment.char = "", encoding = "UTF-8")
Warning message:
In read.table(file = file, header = header, sep = sep, quote = quote,  :
  incomplete final line found by readTableHeader on 'test2.csv'

This gives you how it looks like in the console and RStudio view

> test2
        V1       V2
1 <U+0531> <U+0532>
2        1       10
3        2       20

However importantly you are able to manipulate this within R. Thus in my case it is possible to see that the script window input Ա has UTF-8 encoding, and a grep correctly finds this encoding in your table.

> Encoding("Ա")
[1] "UTF-8"
> grep("Ա", as.character(test2[1,1]))
[1] 1

You may need to find suitable encoding variants that work on your settings, or possibly change them. Unfortunately I am not sure where it is done.

You might not be able to make it pretty in all stages, but it is definitely possible to get it to work also in Windows 7 environment.

这篇关于无法将unicode .csv读取到R中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆