无法将 unicode .csv 读入 R [英] Cannot read unicode .csv into R

查看:28
本文介绍了无法将 unicode .csv 读入 R的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 .csv 文件,其中包含以下数据:

I have a .csv file, which contains the following data:

"Ա","Բ"
1,10
2,20

我无法将其读入 R 中,因此列名的显示方式与文件中的一样.

I cannot read it into R so that the column names are displayed like they are in the file.

d <- read.csv("./Data/1.csv", fileEncoding="UTF-8")
head(d)

产生以下内容:

> d <- read.csv("./Data/1.csv", fileEncoding="UTF-8")
Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  invalid input found on input connection './Data/1.csv'
2: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  incomplete final line found by readTableHeader on './Data/1.csv'
> head(d)
[1] X.
<0 rows> (or 0-length row.names)

同时,在不指定 fileEncoding 的情况下执行相同操作会产生以下结果:

Meanwhile, doing the same without specifying the fileEncoding produces this:

> d <- read.csv("./Data/1.csv")
> head(d)
  Ô. Ô²
1  1 10
2  2 20

当我运行文件"实用程序来找出文件的编码时,它说它是 UTF-8:

When I run the "file" utility to find out the encoding of the file, it says it is UTF-8:

Data1.csv: UTF-8 Unicode text, with CRLF line terminators

我使用的是 RStudio,Windows 7,R 版本 2.15.2,32 位.

I am using RStudio, Windows 7, R version 2.15.2, 32-bit.

提前致谢.

推荐答案

我在此处针对同一问题写了更长的答案:Windows 上的 R:字符编码地狱 .

I wrote a longer answer on the same issue here: R on Windows: character encoding hell .

快速回答,使用参数 encoding 而不是 fileEncoding 应该可以解决您的第一个问题.您可能无法在 RStudio 的控制台或表格视图中阅读它,但您可以在公式中使用它.

Quick answer, using the parameter encoding instead of fileEncoding should fix your first issue. You will not be able to read it possibly in either console or table view in RStudio, but you will be able to use it in formulaes.

d <- read.csv("./Data/1.csv", encoding="UTF-8")
head(d)

已将您的表格保存为 UTF-8 文件:

Having saved your table into a UTF-8 file:

> test2 <- read.csv("test2.csv", header = FALSE, sep = ",", quote = """, dec = ".", fill = TRUE, comment.char = "", encoding = "UTF-8")
Warning message:
In read.table(file = file, header = header, sep = sep, quote = quote,  :
  incomplete final line found by readTableHeader on 'test2.csv'

这为您提供了它在控制台和 RStudio 视图中的样子

This gives you how it looks like in the console and RStudio view

> test2
        V1       V2
1 <U+0531> <U+0532>
2        1       10
3        2       20

但重要的是,您可以在 R 中操作它.因此,在我的情况下,可以看到脚本窗口输入 Ա 具有 UTF-8 编码,并且 grep 在您的表中正确地找到了这种编码.

However importantly you are able to manipulate this within R. Thus in my case it is possible to see that the script window input Ա has UTF-8 encoding, and a grep correctly finds this encoding in your table.

> Encoding("Ա")
[1] "UTF-8"
> grep("Ա", as.character(test2[1,1]))
[1] 1

您可能需要找到适用于您的设置的合适的编码变体,或者可能需要更改它们.不幸的是,我不确定它在哪里完成.

You may need to find suitable encoding variants that work on your settings, or possibly change them. Unfortunately I am not sure where it is done.

您可能无法在所有阶段都让它变得漂亮,但绝对有可能让它在 Windows 7 环境中也能正常工作.

You might not be able to make it pretty in all stages, but it is definitely possible to get it to work also in Windows 7 environment.

这篇关于无法将 unicode .csv 读入 R的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆