如何读取包含转义引号的引用文本 [英] How to read quoted text containing escaped quotes

查看:126
本文介绍了如何读取包含转义引号的引用文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑以下逗号分隔的文件。为了简单起见,它包含一行:






 '我被引用','所以,可以使用逗号 - 它不是分隔符在这里','但不能使用转义的引号:=('






如果您尝试使用命令阅读它

  table<  -  read.csv(filename,header = FALSE)

该行将被分隔为4部分,因为行包含3个逗号,其实我只想读3部分,其中一个包含逗号本身,引用标志来帮忙,我试过:

 表<  -  read.csv(filename,header = FALSE,quote =')

但是错误readTableHeader在表发现的最后一行错误,这是因为奇数(7)的引号。



read.table()以及 scan()具有参数 allowEscapes ,但将其设置为 TR UE 没有帮助。可以,从帮助(扫描)您可以阅读:


解释的转义是控制字符
'\a,\b,\f,\\\
,\r,\t,\v',...
。 ..任何其他转义的
字符被视为自身,包括反斜杠


请建议您如何阅读这样引用的csv-文件,包含转义的 \'引号。

解决方案

使用 readLines()可以读取所有内容,然后通过以别的方式替换报价字符,例如:

  tt<  -  readLines(F:/temp/test.txt)
tt < - gsub(([^ \\] | ^)',\\1\,tt)#替换'by'
tt < - gsub(\\\\,\\ ,tt)#摆脱由于readLines的双重逃逸

这允许您使用 textConnection

  zz < -  textConnection(tt)
read.csv(zz,header = F,quote =\)#给文本输入
close(zz)

不是最美丽的解决方案,但它可以工作(如果你没有一个字符在文件的某个地方不在当然。 ..)


Consider the following comma separated file. For simplicity let it contain one line:


'I am quoted','so, can use comma inside - it is not separator here','but can\'t use escaped quote :=('


If you try to read it with the command

table <- read.csv(filename, header=FALSE)

the line will be separated to 4 parts, because line contains 3 commas. In fact I want to read only 3 parts, one of which contains comma itself. There quote flag comes for help. I tried:

table <- read.csv(filename, header=FALSE, quote="'")

but that falls with error "incomplete final line found by readTableHeader on table". That happens because of odd (seven) number of quotes.

read.table() as well as scan() have parameter allowEscapes, but setting it to TRUE doesn't help. It is ok, cause from help(scan) you can read:

The escapes which are interpreted are the control characters ‘\a, \b, \f, \n, \r, \t, \v’, ... ... Any other escaped character is treated as itself, including backslash

Please suggest how would you read such quoted csv-files, containing escaped \' quotes.

解决方案

One possibility is to use readLines() to get everything read in as is, and then proceed by replacing the quote character by something else, eg :

tt <- readLines("F:/temp/test.txt")
tt <- gsub("([^\\]|^)'","\\1\"",tt) # replace ' by "
tt <- gsub("\\\\","\\",tt) # get rid of the double escape due to readLines

This allows you to read the vector tt in using a textConnection

zz <- textConnection(tt)
read.csv(zz,header=F,quote="\"") # give text input
close(zz)

Not the most beautiful solution, but it works (provided you don't have a " character somewhere in the file off course...)

这篇关于如何读取包含转义引号的引用文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆