读取xml文件时如何处理错误R [英] how to handle errors while reading xml files R

查看:44
本文介绍了读取xml文件时如何处理错误R的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含多个具有相同结构的 xml 文件的列表.其中一些存在结构错误,因此无法读取,我无法手动控制它们,因为文件太多.我知道我需要暗示 try 或 trycatch 函数,我试图理解它们,但我不明白如何在我的情况下使用它们.为了使示例简单,我只想将它们全部转换为 csv.

I have a list of multiple xml files which have the same structure. Some of them have structural errors in them so they can't be read, i'm not capable of controlling them manually because there are too many files. I know that i need to imply the try or trycatch functions, i tried to understand them but i'm not understanding how to use them proberly on my case. To make the example easy i just want to transform them all into a csv.

library(XML)
k <- 1
Initial.files<- list.files("/My/Initial/Folder")
for(i in initial.files){
data<-dataTable(xmlToDataFrame(xmlParse(i)))
write.csv(data, file = paste("data",(k)".csv"))
k <- k+1
}

我得到的错误通常是这样的:

The error i get usually looks like:

Start tag expected, '<' not found
Error in xmlToDataFrame(xmlParse(i)) :
error in evaluation the argument 'doc' in selecting a method for function 'xmlToDataFrame': Error 1: Start tag expected, '<' not found

为了解决我的问题,我必须重写我的第 5 行代码(我知道这是错误的):

To handle my problem i have to rewrite my 5th line of code(i know that it is wrong):

data<- if(try(dataTable(xmlToDataFrame(xmlParse(i)))!= "try-error")
else{ haven't looked close to this because i didn't got that far...}...

我希望它读取文件并给我一个无法读取的文件路径列表.

I would like it to read the files and give me a list of the files path which didn't work to be read.

xml 文件的结构如下:

The Structure of the xml files look like:

<ROWSET>
<ROW>

    <line1>asdf</line1>
    <line2>ghjk</line2> 
</ROW>
</ROWSET>

推荐答案

以下是 tryCatch 的示例.当然,您可以将 read.table 替换为您的函数,它应该仍然可以工作.

Here is an example of tryCatch. You can replace the read.table with your functions, of course, and it should still work.

第一个将捕获任何错误并只返回有错误的文件路径(我创建了两个测试文件——一个可以被 read.table 读取,另一个会抱怨)

This first one will catch any errors and just return the file path for the ones with errors (I created two test files--one which can be read by read.table and the other will complain)

f <- function(path = "~/desktop/test", ...) {
  lf <- list.files(path = path, ...)
  l <- lapply(lf, function(x) {
    tryCatch(read.table(x, header = TRUE),
             error = function(e) x)
  })
  setNames(l, basename(lf))
}

f(full.names = TRUE)

# $cool_test.txt
#   cool test file
# 1    1    2    3
# 
# $notcool_test.txt
# [1] "/Users/rawr/desktop/test/notcool_test.txt"

tryCatch 功能更强大,可以为您节省大量时间

tryCatch is much more powerful and can save you a lot of time

如果您希望以不同方式处理特定文本的错误和/或警告,您可以grep.例如,在这里,如果我试图读取的文件不存在,我想要一条消息.我想要那些存在但由于某种原因无法读取的文件路径.

You can grep the errors and/or warnings for specific text if you want them to be handled differently. Here, for example, I wanted a message if the file I was trying to read doesn't exist. And I want the file path of the ones that exist but cannot be read for some reason.

f2 <- function(path = "~/desktop/test", ..., lf) {
  lf <- if (!missing(lf)) lf else list.files(path = path, ...)
  l <- lapply(lf, function(x) {
    tryCatch(read.table(x, header = TRUE),
             warning = function(w) if (grepl('No such file', w)) {
               sprintf('%s does not exist', x)
             } else sprintf('Some other warning for %s', x),
             error = function(e) if (grepl('Error in scan', e)) {
               message(sprintf('Check format of %s', x))
               x
              } else message(sprintf('Some other error for %s', x)))
  })
  setNames(l, basename(lf))
}

我添加了一个新参数,以便我可以传递文件路径列表来显示它如何处理不存在的文件:

I added a new argument so I can pass a list of file paths instead to show how it handles files that don't exist:

lf <- c("/Users/rawr/desktop/test/cool_test.txt",
        "/Users/rawr/desktop/test/notcool_test.txt",
        "/Users/rawr/desktop/test/file_does_not_exist.txt")

(out <- f2(lf = lf))

# Check format of /Users/rawr/desktop/test/notcool_test.txt
# $cool_test.txt
#   cool test file
# 1    1    2    3
# 
# $notcool_test.txt
# [1] "/Users/rawr/desktop/test/notcool_test.txt"
# 
# $file_does_not_exist.txt
# [1] "/Users/rawr/desktop/test/file_does_not_exist.txt does not exist"

所以现在您有一个列表,其中可以包含数据框、文件路径或其他消息.可以过滤掉数据框,用多种方式写出来,这里有两种:

So now you have a list that can contain data frames, file paths, or other messages. You can filter out the data frames and write them in many ways, here are two:

lapply(Filter(is.data.frame, out), function(x) do stuff)

for (ii in out)
  if (is.data.frame(ii)) write.csv(ii) else print('not a data frame')

这篇关于读取xml文件时如何处理错误R的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆