加载具有 XML 命名空间的 XLSX 电子表格 [英] Load an XLSX spreadsheet having XML namespaced
问题描述
我有一组 PhpSpreadsheet 无法加载的 XLSX 文件,因为 simplexml_load_string 从(例如)工作簿 XML 文件返回一个空的 SimpleXML 元素.
I have a set of XLSX files that PhpSpreadsheet cannot load, because simplexml_load_string returns an empty SimpleXMLelement from (for instance) the workbook XML file.
该文件具有以下格式,在删除所有出现的 x:
命名空间和声明本身(例如,<x:workbook>
标签已转换为
).
The file has the following format, that can be loaded by simplexml after removing all occurrences of the x:
namespace, and the declaration itself (that is, for instance, the <x:workbook>
tag has been converted to <workbook>
).
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<x:workbook xmlns:x15ac="http://schemas.microsoft.com/office/spreadsheetml/2010/11/ac" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:x15="http://schemas.microsoft.com/office/spreadsheetml/2010/11/main" xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision" xmlns:xr6="http://schemas.microsoft.com/office/spreadsheetml/2016/revision6" xmlns:xr10="http://schemas.microsoft.com/office/spreadsheetml/2016/revision10" xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2" mc:Ignorable="x15 xr xr6 xr10 xr2" xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<x:fileVersion appName="xl" lastEdited="7" lowestEdited="4" rupBuild="23801" />
<x:workbookPr codeName="ThisWorkbook" />
<mc:AlternateContent xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006">
<mc:Choice Requires="x15">
<x15ac:absPath xmlns:x15ac="http://schemas.microsoft.com/office/spreadsheetml/2010/11/ac" url=".........." />
</mc:Choice>
</mc:AlternateContent>
<xr:revisionPtr revIDLastSave="0" documentId=".........." xr6:coauthVersionLast="46" xr6:coauthVersionMax="46" xr10:uidLastSave="{00000000-0000-0000-0000-000000000000}" />
<x:bookViews>
<x:workbookView xWindow="-120" yWindow="-120" windowWidth="29040" windowHeight="15840" xr2:uid="{00000000-000D-0000-FFFF-FFFF00000000}" />
</x:bookViews>
<x:sheets>
<x:sheet name="......" sheetId="1" r:id="rId1" />
</x:sheets>
<x:calcPr calcId="191029" />
</x:workbook>
我不确定 XML 文件是否错误,因为 XLSX 文件可以打开 - 例如 - 使用 Libre Office.无论如何,已经设法在 Xlsx.php:
I'm not sure the XML file is wrong, since the XLSX file(s) can be opened - for instance - with Libre Office. Anyway, have managed to load the file(s) hacking a simple minded function cleanup_xml() in Xlsx.php:
//~ http://schemas.openxmlformats.org/spreadsheetml/2006/main"
$xmlWorkbook = simplexml_load_string(
cleanup_xml($this->securityScanner->scan($this->getFromZipArchive($zip, "{$rel['Target']}"))),
'SimpleXMLElement',
Settings::getLibXmlLoaderOptions()
);
也许有一种正确/干净的方法可以强制 simplexml API 加载此类文件?
Maybe there is a proper/clean way to force simplexml API to load such files ?
编辑:
我错误地认为在 cleanup_xml hack 之后所有问题都消失了.好像数据行XML文件也有问题,可能和上面一样...
I was wrong thinking all problems were gone after the cleanup_xml hack. Seems that also the data rows XML file has problems, probably the same as above...
编辑:
确实,我将 cleanup_xml() 移动到 XmlScanner::scan 中,以应用于每个加载的 XML,现在似乎可以工作了...
Indeed, I moved cleanup_xml() into XmlScanner::scan, to apply to every loaded XML, and now seems to work...
编辑:
似乎命名空间声明是正确的,至少来自 这个简单示例...
Seems the namespace declaration is correct, at least, from this simple example...
那么,我想知道为什么 simplexml_load_string 不接受格式:
Then, I wonder why simplexml_load_string doesn't accept the format:
<x:workbook ... xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
....
</x:workbook>
虽然它显然接受
<workbook ... xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
....
<workbook>
编辑
深入研究 simplexml API,这个答案有助于理解问题.现在我可以尝试重写我对命名空间的 hackish cleanup_xml 帐户......只是想知道 PhpSpreadsheet 是否提供了更好的方法......似乎很奇怪这个问题以前没有被注意到......
Have digged into simplexml API, this answer helped to understand the problem. Now I can try to rewrite my hackish cleanup_xml accounting for namespaces... Just wondering if PhpSpreadsheet offers a better way... seems strange this problem has been unnoticed before...
编辑
好的,现在我找到了错误报告...>
ok, now I've found the bug report...
推荐答案
这似乎是 aPhpSpreadsheet 中的错误.
打开我本周使用 Microsoft Excel 的真实副本workbook.xml"创建的 XLSX 文件;开始是这样的:
Opening an XLSX file I created this week with a real copy of Microsoft Excel, the "workbook.xml" starts like this:
<workbook
xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
mc:Ignorable="x15 xr xr6 xr10 xr2"
xmlns:x15="http://schemas.microsoft.com/office/spreadsheetml/2010/11/main"
xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision"
xmlns:xr6="http://schemas.microsoft.com/office/spreadsheetml/2016/revision6"
xmlns:xr10="http://schemas.microsoft.com/office/spreadsheetml/2016/revision10"
xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2">
这声明了将在文档中使用的八个不同的命名空间.一个恰好被定义为默认命名空间",其他七个被分配了前缀 - 但所有这些都只是这个特定文件的本地.
This declares eight different namespaces that will be used in the document. One happens to be defined as the "default namespace", and the other seven are assigned prefixes - but all of that is just local to this specific file.
如果我们查看您的 XML 文档,我们可以看到所有相同的名称空间都在使用中,另外还有一个:
If we look at your XML document, we can see all the same namespaces in use, plus an extra one:
<x:workbook
xmlns:x15ac="http://schemas.microsoft.com/office/spreadsheetml/2010/11/ac"
xmlns:r="http://schemas.openxmlformats.org/officeDocumen/2006/relationships"
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
xmlns:x15="http://schemas.microsoft.com/office/spreadsheetml/2010/11/main"
xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision"
xmlns:xr6="http://schemas.microsoft.com/office/spreadsheetml/2016/revision6"
xmlns:xr10="http://schemas.microsoft.com/office/spreadsheetml2016/revision10"
xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2"
mc:Ignorable="x15 xr xr6 xr10 xr2"
xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
唯一的区别是命名空间http://schemas.openxmlformats.org/spreadsheetml/2006/main"已分配前缀x",而不是设置为默认命名空间,但这对其含义没有影响.一个不同的库可能会完全不同地标记命名空间,只是因为它生成 XML 的方式:
The only difference is that the namespace "http://schemas.openxmlformats.org/spreadsheetml/2006/main" has been assigned prefix "x", rather than set as the default namespace, but that makes no difference to its meaning. A different library might label the namespaces completely differently, just because of the way it generates the XML:
<ns0:workbook
xmlns:ns0="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
xmlns:ms1="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
xmlns:ns2="http://schemas.openxmlformats.org/markup-compatibility/2006"
ns2:Ignorable="x15 xr xr6 xr10 xr2"
xmlns:ns3="http://schemas.microsoft.com/office/spreadsheetml/2010/11/main"
xmlns:ns4="http://schemas.microsoft.com/office/spreadsheetml/2014/revision"
xmlns:ns5="http://schemas.microsoft.com/office/spreadsheetml/2016/revision6"
xmlns:ns6="http://schemas.microsoft.com/office/spreadsheetml/2016/revision10"
xmlns:ns7="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2">
正如在这个参考答案中所解释的,SimpleXML 的命名空间处理基于使用 ->children()
方法来选择要使用的命名空间.使用它的正确方法是始终指定您想要的命名空间 URI,例如http://schemas.openxmlformats.org/spreadsheetml/2006/main"或http://schemas.microsoft.com/office/spreadsheetml/2016/revision10".
As explained in this reference answer, SimpleXML's namespace handling is based around using the ->children()
method to select the namespace you want to work with. The correct way to use this is to always specify the namespace URI you want, e.g. "http://schemas.openxmlformats.org/spreadsheetml/2006/main" or "http://schemas.microsoft.com/office/spreadsheetml/2016/revision10".
但是,由于同一个程序通常会创建具有相同前缀选择的 XML 文档,因此很容易编写依赖于以下内容的错误代码:
However, because the same program generally creates XML documents with the same choice of prefixes, it's easy to write incorrect code which relies on:
- 一个特定的命名空间是默认的,因此在你第一次调用
->children()
之前被选中 - 特定的命名空间被绑定到特定的前缀,因此可以通过查找该前缀来选择
PhpSpreadsheet 的作者似乎都犯了两个错误,这意味着当您尝试加载由不同程序创建的文档时,它没有找到它期望的名称空间即使它们实际上在那里强>.
The author of PhpSpreadsheet appears to have made both mistakes, meaning that when you try to load a document created by a different program, it doesn't find the namespaces it expects even though they're actually there.
这篇关于加载具有 XML 命名空间的 XLSX 电子表格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!