Excel工作表到Numpy数组 [英] Excel worksheet to Numpy array

查看:317
本文介绍了Excel工作表到Numpy数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试做一个非常简单的事情:将Excel工作表的某些部分加载到Numpy数组中.我发现了一个有效的kludge,但令人尴尬的是非Python的: 说我的工作表被加载为"ws",代码:

I'm trying to do an unbelievably simple thing: load parts of an Excel worksheet into a Numpy array. I've found a kludge that works, but it is embarrassingly unpythonic: say my worksheet was loaded as "ws", the code:

A = np.zeros((37,3))
for i in range(2,39):
   for j in range(1,4):
      A[i-2,j-1]= ws.cell(row = i, column = j).value

将"ws"的内容加载到数组A中.

loads the contents of "ws" into array A.

必须有一种更优雅的方法来做到这一点.例如,csvread允许更自然地执行此操作,虽然我可以将.xlsx文件很好地转换为csv文件,但使用openpyxl的整个目的是避免这种转换.因此,我们就是强大的Intertubes的集体智慧:执行此概念上琐碎的操作的更Python方式是什么?

There MUST be a more elegant way to do this. For instance, csvread allows to do this much more naturally, and while I could well convert the .xlsx file into a csv one, the whole purpose of working with openpyxl was to avoid that conversion. So there we are, Collective Wisdom of the Mighty Intertubes: what's a more pythonic way to perform this conceptually trivial operation?

预先感谢您的回答.

PS:我在Mac上通过Spyder操作了Python 2.7.5,是的,我确实阅读了openpyxl教程,这是我了解到这一点的唯一原因.

PS: I operate Python 2.7.5 on a Mac via Spyder, and yes, I did read the openpyxl tutorial, which is the only reason I got this far.

推荐答案

您可以做到

A = np.array([[i.value for i in j] for j in ws['C1':'E38']])

编辑-进一步说明. (首先感谢您向我介绍openpyxl,我怀疑我会不时使用它)

EDIT - further explanation. (firstly thanks for introducing me to openpyxl, I suspect I will use it quite a bit from time to time)

  1. 从工作表对象获取多个单元格的方法将生成一个生成器.如果您想一遍又一遍地工作,这可能会更有效率,因为您可以立即开始,而不必等待所有工作都载入列表.
  2. 要强制生成器生成列表,您可以使用list(ws['C1':'E38'])或如上所述的列表理解
  3. 每一行都是一个元组(即使只有一列宽)
  4. 单元格对象.这些不只是数字,它们还有很多其他方面.但是,如果您想获取数组的数字,则可以使用.value属性.这确实是您问题的症结所在,csv文件不包含excel电子表格的结构化信息.
  5. (据我所知)没有一种内置的方法可以从一系列单元格中提取值,因此您必须草拟草图才能有效地做些事情.
  1. the method of getting multiple cells from the worksheet object produces a generator. This is probably much more efficient if you want to work your way through a large sheet as you can start straight away without waiting for it all to load into your list.
  2. to force a generator to make a list you can either use list(ws['C1':'E38']) or a list comprehension as above
  3. each row is a tuple (even if only one column wide) of
  4. Cell objects. These have a lot more about them than just a number but if you want to get the number for your array you can use the .value attribute. This is really the crux of your question, csv files don't contain the structured info of an excel spreadsheet.
  5. there isn't (as far as I can tell) a built in method for extracting values from a range of cells so you will have to do something effectively as you have sketched out.

以这种方式进行操作的优点是:无需计算数组的维数并开始做一个空的数组,无需计算np数组的已校正索引号,可以更快地列出理解.缺点是它需要以"A1"格式定义的角".如果不知道范围,则必须使用iter_rows,行或列

The advantages of doing it my way are: no need to work out the dimension of the array and make an empty one to start with, no need to work out the corrected index number of the np array, list comprehensions faster. Disadvantage is that it needs the "corners" defining in "A1" format. If the range isn't know then you would have to use iter_rows, rows or columns

A = np.array([[i.value for i in j[2:5]] for j in ws.rows])

如果您不知道有多少列,那么您将不得不循环检查值,更像是您的原始想法

if you don't know how many columns then you will have to loop and check values more like your original idea

这篇关于Excel工作表到Numpy数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆