如何阅读R中带有read.table的双引号转义值 [英] How to read \" double-quote escaped values with read.table in R

查看:491
本文介绍了如何阅读R中带有read.table的双引号转义值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法读取包含类似于R中以下内容的行的文件.

I am having trouble to read a file containing lines like the one below in R.

"_:b5507F4C7x59005","Fabiana D\"atri"

有什么主意吗?如何使read.table理解\是引号的转义符?

Any idea? How can I make read.table understand that \" is the escape of quote?

干杯, 亚历山大(Alexandre)

Cheers, Alexandre

推荐答案

在我看来,read.table/read.csv 无法处理转义的引号.

It seems to me that read.table/read.csv cannot handle escaped quotes.

...但是我认为我有一个(丑陋的)变通办法,它受到@nullglob的启发;

...But I think I have an (ugly) work-around inspired by @nullglob;

  • 首先读取不带引号的文件. (这不会处理@Ben Bolker指出的嵌入式,)
  • 然后遍历字符串列并删除引号:
  • First read the file WITHOUT a quote character. (This won't handle embedded , as @Ben Bolker noted)
  • Then go though the string columns and remove the quotes:

测试文件如下所示(为方便起见,我添加了一个非字符串列):

The test file looks like this (I added a non-string column for good measure):

13,"foo","Fab D\"atri","bar"
21,"foo2","Fab D\"atri2","bar2"

这是代码:

# Generate test file
writeLines(c("13,\"foo\",\"Fab D\\\"atri\",\"bar\"",
             "21,\"foo2\",\"Fab D\\\"atri2\",\"bar2\"" ), "foo.txt")

# Read ignoring quotes
tbl <- read.table("foo.txt", as.is=TRUE, quote='', sep=',', header=FALSE, row.names=NULL)

# Go through and cleanup    
for (i in seq_len(NCOL(tbl))) {
    if (is.character(tbl[[i]])) {
        x <- tbl[[i]]
        x <- substr(x, 2, nchar(x)-1) # Remove surrounding quotes
        tbl[[i]] <- gsub('\\\\"', '"', x) # Unescape quotes
    }
}

输出正确:

> tbl
  V1   V2          V3   V4
1 13  foo  Fab D"atri  bar
2 21 foo2 Fab D"atri2 bar2

这篇关于如何阅读R中带有read.table的双引号转义值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆