R按列名读取Excel [英] R read excel by column names

查看:349
本文介绍了R按列名读取Excel的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我有一堆Excel文件,我想循环浏览并将特定的,不连续的列读入数据帧.使用readxl可以满足以下基本要求:

So I have a bunch of excel files I want to loop through and read specific, discontinuous columns into a data frame. Using the readxl works for the basic stuff like this:

library(readxl)
library(plyr)
wb <- list.files(pattern = "*.xls")
dflist <- list()

for (i in wb){
  dflist[[i]] <- data.frame(read_excel(i, sheet = "SheetName", skip=3, col_names = TRUE))
}

# now put them into a data frame
data <- ldply(dflist, data.frame, .id = NULL)

这行得通(勉强),但是问题是我的excel文件大约有114列,我只想要特定的列.另外,我也不想让R猜出col_types,因为它弄乱了其中的一些内容(例如,对于字符串列,如果第一个值以数字开头,它将尝试将整个列解释为数字,并且崩溃).所以我的问题是:如何指定要读取的特定的,不连续的列? range参数使用cell_ranger包,该包不允许读取不连续的列.还有其他选择吗?

This works (barely) but the problem is my excel files have about 114 columns and I only want specific ones. Also I do not want to allow R to guess the col_types because it messes some of them up (eg for a string column, if the first value starts with a number, it tries to interpret the whole column as numeric, and crashes). So my question is: How do I specify specific, discontinuous columns to read? The range argument uses the cell_ranger package which does not allow for reading discontinuous columns. So any alternative?

推荐答案

.xlsx>>>您可以使用库openxlsx

openxlsx中的read.xlsx函数具有可选参数cols,该参数采用数字索引,指定要读取的列.

.xlsx >>> you can use library openxlsx

The read.xlsx function from library openxlsx has an optional parameter cols that takes a numeric index, specifying which columns to read.

如果至少一列包含字符,则似乎将所有列读取为字符.

It seems it reads all columns as characters if at least one column contains characters.

openxlsx::read.xlsx("test.xlsx", cols = c(2,3,6))

.xls>>您可以使用库XLConnect

潜在的问题是库XLConnect需要库rJava,这在某些系统上安装可能很棘手.如果可以运行它,则readWorksheet()keepdrop参数将接受列名和索引.参数colTypes处理列类型.这样对我有用:

.xls >>> you can use library XLConnect

The potential problem is that library XLConnect requires library rJava, which might be tricky to install on some systems. If you can get it running, the keep and drop parameters of readWorksheet() accept both column names and indices. Parameter colTypes deals with column types. This way it works for me:

options(java.home = "C:\\Program Files\\Java\\jdk1.8.0_74\\") #path to jdk
library(rJava)
library(XLConnect)
workbook <- loadWorkbook("test.xls")
readWorksheet(workbook, sheet = "Sheet0", keep = c(1,2,5))

readxl 对于.xls和.xlsx都适用.例如

Library readxl works well for both .xls and .xlsx if you want to read a range (rectangle) from your excel file. E.g.

readxl::read_xls("test.xls", range = "B3:D8")
readxl::read_xls("test.xls", sheet = "Sheet1", range = cell_cols("B:E"))
readxl::read_xlsx("test.xlsx", sheet = 2, range = cell_cols(2:5))

这篇关于R按列名读取Excel的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆