读取带有嵌入式引号的csv文件到R中 [英] Reading a csv file with embedded quotes into R

查看:133
本文介绍了读取带有嵌入式引号的csv文件到R中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须使用这样的.csv档案:

I have to work with a .csv file that comes like this:

"IDEA ID,""IDEA TITLE"",""VOTE VALUE"""
"56144,""Net Present Value PLUS (NPV+)"",1"
"56144,""Net Present Value PLUS (NPV+)"",1"

如果我使用read.csv,我得到一个带有一个变量的数据框。我需要的是一个具有三列的数据框架,其中列用逗号分隔。我如何处理行首和行尾的引号?

If I use read.csv, I obtain a data frame with one variable. What I need is a data frame with three columns, where columns are separated by commas. How can I handle the quotes at the beginning of the line and the end of the line?

推荐答案

我建议删除初始/终端引号,并将背对背双引号转换为单个双引号。后者是至关重要的,如果一些字符串包含逗号本身,如

I suggest both removing the initial/terminal quotes and turning the back-to-back double quotes into single double quotes. The latter is crucial in case some of the strings contain commas themselves, as in

"1,""A mostly harmless string"",11"
"2,""Another mostly harmless string"",12"
"3,""These, commas, cause, trouble"",13"

只保留初始/终端引号,同时保持背靠背报价, $ c> read.csv()函数产生6个变量,因为它将最后一行中的所有逗号解释为值分隔符。因此,完整的代码可能如下所示:

Removing only the initial/terminal quotes while keeping the back-to-back quote leads the read.csv() function to produce 6 variables, as it interprets all commas in the last row as value separators. So the complete code might look like this:

data.text <- readLines("fullofquotes.csv")  # Reads data from file into a character vector.
data.text <- gsub("^\"|\"$", "", data.text) # Removes initial/terminal quotes.
data.text <- gsub("\"\"", "\"", data.text)  # Replaces "" by ".
data <- read.csv(text=data.text, header=FALSE)

或者,当然,所有在一行

Or, of course, all in a single line

data <- read.csv(text=gsub("\"\"", "\"", gsub("^\"|\"$", "", readLines("fullofquotes.csv", header=FALSE))))

这篇关于读取带有嵌入式引号的csv文件到R中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆