读取带有嵌入式双引号和逗号的CSV文件 [英] Read CSV file with embedded double quotes and commas

查看：233 发布时间：2020/7/12 0:54:57 r csv data.table

本文介绍了读取带有嵌入式双引号和逗号的CSV文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用data.table包中的fread()函数读取肮脏的CSV文件，但字符串值中嵌入的双引号和逗号有问题，即引号字段中存在未转义的双引号.以下示例数据说明了我的问题.它由3行/行和6列组成，第一行包含列名称:

I am trying to read a dirty CSV file using the fread() function from the data.table package but have a problem with embedded double quotes and commas in the string values, that is, unescaped double quotes present in a quoted field. The following example data illustrates my problem. It consists of 3 lines/rows and 6 columns, with the first line containing the column names:

"SA","SU","CC","CN","POC","PAC"
"NE","R","000","H "B", O","1","8"
"A","A","000","P","E,5","8"

第一个问题在第2行中，该行具有一对嵌入的双引号和一个逗号:"H "B", O".第二个问题在第3行中，双引号内的逗号为:"E,5".我尝试了以下方法:

The first problem is in line 2 which has an embedded pair of double quotes and a comma: "H "B", O". The second problem is in line 3 which has a comma within the double quotes: "E,5". I have tried the following:

尝试1

library(data.table)
x1 <- fread(file = "example.csv", quote = "\"")

输出:

> x1
     V1 "SA" "SU"   "CC" "CN" "POC" "PAC"
1: "NE"  "R"    0 "H "B"   O"   "1"     8
2:  "A"  "A"    0    "P"   "E    5"     8

消息:

在前100行中找到并解决了不正确的报价.如果字段不加引号(例如，字段分隔符不出现在任何字段)，请尝试quote ="以避免此警告.检测到6个列名，但数据有7列(即无效文件).增加了1个默认值第一列的列名被猜测为行名或索引.如果此猜测不正确，请在以后使用setnames()，或者修复创建文件的文件写入命令以创建有效文件文件.

Found and resolved improper quoting in first 100 rows. If the fields are not quoted (e.g. field separator does not appear within any field), try quote="" to avoid this warning.Detected 6 column names but the data has 7 columns (i.e. invalid file). Added 1 extra default column name for the first column which is guessed to be row names or an index. Use setnames() afterwards if this guess is not correct, or fix the file write command that created the file to create a valid file.

结论:结果不正确，因为它添加了新列V1.

Conclusion: the result is incorrect because it adds a new column V1.

尝试2

x2 <- fread(file = "example.csv", quote = "")

输出:

> x2
     V1 "SA"  "SU"   "CC" "CN" "POC" "PAC"
1: "NE"  "R" "000" "H "B"   O"   "1"   "8"
2:  "A"  "A" "000"    "P"   "E    5"   "8"

消息:

检测到6列名称，但数据有7列(即无效文件).为第一列添加了1个额外的默认列名称，即猜测是行名或索引.如果这之后使用setnames() 猜测不正确，或修复了创建该文件的文件写入命令文件以创建有效文件.

Detected 6 column names but the data has 7 columns (i.e. invalid file). Added 1 extra default column name for the first column which is guessed to be row names or an index. Use setnames() afterwards if this guess is not correct, or fix the file write command that created the file to create a valid file.

结论:结果不正确，因为它添加了新列V1 ..

Conclusion: the result is incorrect because it adds a new column V1..

解决方案?

我正在寻找一种获得类似于

What I am looking for is a way to get an output similar to

> x3
   SA SU CC       CN POC PAC
1: NE  R  0 H 'B', O   1   8
2:  A  A  0        P E,5   8

最好使用fread()，但也欢迎其他建议.

preferably using fread(), but other suggestions are welcome.

读取带有嵌入式双引号和逗号的CSV文件 [英] Read CSV file with embedded double quotes and commas

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

读取带有嵌入式双引号和逗号的CSV文件 [英] Read CSV file with embedded double quotes and commas

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭