带有模式映射的 XLSX 到 XML [英] XLSX to XML with schema map

查看:98
本文介绍了带有模式映射的 XLSX 到 XML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在映射到 XML 架构的 XLSX 工作簿之上使用 XML 工具构建了几个基本工作流.您可以将数据输入电子表格,导出 XML,然后我有一些脚本可以处理数据.

I have built a couple basic workflows using XML tools on top of XLSX workbooks that are mapped to an XML schema. You would enter data into the spreadsheet, export the XML and I had some scripts that would then work with the data.

现在我试图消除这一步并构建一个更加集成和便携的工具,其他人可以通过从 XSLT/XQuery 迁移到 Python 来轻松使用它.我仍然想使用 Excel 进行数据输入,但让 Python 脚本直接读取 XLSX 文件.

Now I'm trying to eliminate that step and build a more integrated and portable tool that others could use easily by moving from XSLT/XQuery to Python. I would still like to use Excel for the data entry, but have the Python script read the XLSX file directly.

我发现了一堆易于使用的库可以从 Excel 中读取,但它们需要明确说明数据所在的单元格,例如 range('A1:C2') 等.使用 XML 映射的有用之处在于用户可以调整大小甚至移动表格以适应不同的行并重命名工作表.他们是一个可以让我选择表格作为单位的图书馆吗?

I found a bunch of easy to use libraries to read from Excel but they need to explicitly state what cells the data is in, like range('A1:C2') etc. The useful thing about using the XML maps was that users could resize or even move tables to fit different rows and rename sheets. Is their a library that would let me select tables as units?

我尝试的另一种方法是解压缩 XLSX 并直接解析 XML.问题在于我们的数据非常复杂(最多需要 30-50 张),并且在未压缩的 XLSX 结构中解析它确实令人生畏.我确实在未压缩的 XLSX 中找到了我的 XML 架构,那么有什么方法可以在 Excel 之外将数据重新格式化为该架构?(基本上当我将工作簿保存为 .xml 文件时 Excel 会做什么)

Another approach I tried was to just uncompress the XLSX and just parse the XML directly. The problem with that is that our data is quite complex (taking up to 30-50 sheets) and parsing that in the uncompressed XLSX structure is really daunting. I did find my XML schema within the uncompressed XLSX, so is there any way to reformat the data into this schema outside of Excel? (basically what Excel does when I save a workbook as an .xml file)

推荐答案

Excel 格式非常复杂,组件之间存在依赖关系——例如,您无法确定文件夹工作表中工作表的顺序是否有任何影响文件在 Excel 中的样子.

The Excel format is pretty complicated with dependencies between components – you can't for example be sure of that the order of the worksheets in the folder worksheets has any bearing to what the file looks like in Excel.

我不太明白您要做什么,但现有的库为隐藏 XML 层的客户端代码提供了一个接口.如果您不希望这样,您将不得不寻找您认为有用的部分.在 openpyxl 中,您想查看 openpyxl/reader 中的内容,特别是 worksheet.py.

I don't really understand exactly what you're trying to do but the existing libraries present an interface for client code that hides the XML layer. If you don't want that you'll have to root around for the parts that you find useful. In openpyxl you want to look at the stuff in openpyxl/reader specifically worksheet.py.

但是,使用 lxml 可能会更好,因为这(在后台使用 libxml2)将允许您将单个 XML 加载到 Python 中并使用 .objectify() 方法直接操作它.我们不会在 openpyxl 中这样做,因为 XML 树会消耗大量内存(并且许多人有非常大的工作表),但是用于使用 Powerpoint 的库显示了这有多么容易.

However, you might have better luck using lxml as this (using libxml2 in the background) will allow you load a single XML into Python and manipulate it directly using the .objectify() method. We don't do this in openpyxl because XML trees consume a lot of memory (and many people have very large worksheets) but the library for working with Powerpoint shows just how easy this can be.

这篇关于带有模式映射的 XLSX 到 XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆