readxl :: read_xls返回"libxls错误:无法打开文件". [英] readxl::read_xls returns "libxls error: Unable to open file"

查看:144
本文介绍了readxl :: read_xls返回"libxls错误:无法打开文件".的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有多个.xls(〜100MB)文件,我想从中将多个工作表(每个工作表)作为数据帧加载到R中.我尝试了各种功能,例如 xlsx :: xlsx2 XLConnect :: readWorksheetFromFile ,这两个功能始终运行很长时间(> 15分钟),并且从未完成而且我不得不退出RStudio才能继续工作.

我还尝试了 gdata :: read.xls ,该操作确实完成了,但是每张纸要花费3分钟以上的时间,并且无法一次提取多张纸(这对于提高速度非常有帮助)我的管道)就像 XLConnect :: loadWorkbook 一样.

执行这些函数所花费的时间(而且我甚至不确定如果让它们运行更长的时间,前两个函数是否会完成)对于我的管道来说太长了,我需要一次处理多个文件.有没有办法使它们更快地完成/完成?

在一些地方,我看到了使用函数 readxl :: read_xls 的建议,该函数似乎被广泛推荐用于此任务,并且每张纸应该更快.但是,这给了我一个错误:

 >#最小的可重现示例:>setwd("/Users/USER/Desktop")>图书馆(readxl)>数据<-read_xls(path ="test_file.xls")错误:文件路径:/Users/USER/Desktop/test_file.xlslibxls错误:无法打开文件 

我还做了一些基础测试,以确保文件存在并且格式正确:

 >#测试存在与否文件格式>file.exists("test_file.xls")[1]是>format_from_ext("test_file.xls")[1]"xls">format_from_signature("test_file.xls")[1]"xls" 

上面使用的 test_file.xls 可用

同样,您可以使用 read_xls 函数代替 read_excel .

我检查了一下,它也可以正常工作甚至更快一点,因为 read_excel read_xls read_xlsx 函数的包装> readxl 包.

此外,您可以使用 readxl 包中的 excel_sheets 函数来读取Excel文件的所有工作表.

更新

基准测试是通过 microbenchmark 软件包完成的,用于以下软件包/功能: gdata :: read.xls XLConnect :: readWorksheetFromFile readxl :: read_excel .

但是 XLConnect 是基于Java的解决方案,因此需要大量RAM.

I have multiple .xls (~100MB) files from which I would like to load multiple sheets (from each) into R as a dataframe. I have tried various functions, such as xlsx::xlsx2 and XLConnect::readWorksheetFromFile, both of which always run for a very long time (>15 mins) and never finish and I have to force-quit RStudio to keep working.

I also tried gdata::read.xls, which does finish, but it takes more than 3 minutes per one sheet and it cannot extract multiple sheets at once (which would be very helpful to speed up my pipeline) like XLConnect::loadWorkbook can.

The time it takes these functions to execute (and I am not even sure the first two would ever finish if I let them go longer) is way too long for my pipeline, where I need to work with many files at once. Is there a way to get these to go/finish faster?

In several places, I have seen a recommendation to use the function readxl::read_xls, which seems to be widely recommended for this task and should be faster per sheet. This one, however, gives me an error:

> # Minimal reproducible example:
> setwd("/Users/USER/Desktop")
> library(readxl)
> data <- read_xls(path="test_file.xls")
Error: 
  filepath: /Users/USER/Desktop/test_file.xls
  libxls error: Unable to open file

I also did some elementary testing to make sure the file exists and is in the correct format:

> # Testing existence & format of the file
> file.exists("test_file.xls")
[1] TRUE
> format_from_ext("test_file.xls")
[1] "xls"
> format_from_signature("test_file.xls")
[1] "xls"

The test_file.xls used above is available here. Any advice would be appreciated in terms of making the first functions run faster or the read_xls run at all - thank you!

UPDATE:

It seems that some users are able to open the file above using the readxl::read_xls function, while others are not, both on Mac and Windows, using the most up to date versions of R, Rstudio, and readxl. The issue has been posted on the readxl GitHub and has not been resolved yet.

解决方案

I downloaded your dataset and read each excel sheet in this way (for example, for sheets "Overall" and "Area"):

install.packages("readxl")
library(readxl)
library(data.table)

dt_overall <- as.data.table(read_excel("test_file.xls", sheet = "Overall"))
area_sheet <- as.data.table(read_excel("test_file.xls", sheet = "Area"))

Finally, I get dt like this (for example, only part of the dataset for the "Area" sheet):

Just as well, you can use the read_xls function instead read_excel.

I checked, it also works correctly and even a little faster, since read_excel is a wrapper over read_xls and read_xlsx functions from readxl package.

Also, you can use excel_sheets function from readxl package to read all sheets of your Excel file.

UPDATE

Benchmarking is done with microbenchmark package for the following packages/functions: gdata::read.xls, XLConnect::readWorksheetFromFile and readxl::read_excel.

But XLConnect it's a Java-based solution, so it requires a lot of RAM.

这篇关于readxl :: read_xls返回"libxls错误:无法打开文件".的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆