加载具有 XML 命名空间的 XLSX 电子表格 [英] Load an XLSX spreadsheet having XML namespaced

查看:30
本文介绍了加载具有 XML 命名空间的 XLSX 电子表格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一组 PhpSpreadsheet 无法加载的 XLSX 文件,因为 simplexml_load_string 从(例如)工作簿 XML 文件返回一个空的 SimpleXML 元素.

I have a set of XLSX files that PhpSpreadsheet cannot load, because simplexml_load_string returns an empty SimpleXMLelement from (for instance) the workbook XML file.

该文件具有以下格式,在删除所有出现的 x: 命名空间和声明本身(例如,<x:workbook> 标签已转换为 ).

The file has the following format, that can be loaded by simplexml after removing all occurrences of the x: namespace, and the declaration itself (that is, for instance, the <x:workbook> tag has been converted to <workbook>).

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<x:workbook xmlns:x15ac="http://schemas.microsoft.com/office/spreadsheetml/2010/11/ac" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:x15="http://schemas.microsoft.com/office/spreadsheetml/2010/11/main" xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision" xmlns:xr6="http://schemas.microsoft.com/office/spreadsheetml/2016/revision6" xmlns:xr10="http://schemas.microsoft.com/office/spreadsheetml/2016/revision10" xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2" mc:Ignorable="x15 xr xr6 xr10 xr2" xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
  <x:fileVersion appName="xl" lastEdited="7" lowestEdited="4" rupBuild="23801" />
  <x:workbookPr codeName="ThisWorkbook" />
  <mc:AlternateContent xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006">
    <mc:Choice Requires="x15">
      <x15ac:absPath xmlns:x15ac="http://schemas.microsoft.com/office/spreadsheetml/2010/11/ac" url=".........." />
    </mc:Choice>
  </mc:AlternateContent>
  <xr:revisionPtr revIDLastSave="0" documentId=".........." xr6:coauthVersionLast="46" xr6:coauthVersionMax="46" xr10:uidLastSave="{00000000-0000-0000-0000-000000000000}" />
  <x:bookViews>
    <x:workbookView xWindow="-120" yWindow="-120" windowWidth="29040" windowHeight="15840" xr2:uid="{00000000-000D-0000-FFFF-FFFF00000000}" />
  </x:bookViews>
  <x:sheets>
    <x:sheet name="......" sheetId="1" r:id="rId1" />
  </x:sheets>
  <x:calcPr calcId="191029" />
</x:workbook>

我不确定 XML 文件是否错误,因为 XLSX 文件可以打开 - 例如 - 使用 Libre Office.无论如何,已经设法在 Xlsx.php:

I'm not sure the XML file is wrong, since the XLSX file(s) can be opened - for instance - with Libre Office. Anyway, have managed to load the file(s) hacking a simple minded function cleanup_xml() in Xlsx.php:

                    //~ http://schemas.openxmlformats.org/spreadsheetml/2006/main"
                    $xmlWorkbook = simplexml_load_string(
                      cleanup_xml($this->securityScanner->scan($this->getFromZipArchive($zip, "{$rel['Target']}"))),
                        'SimpleXMLElement',
                        Settings::getLibXmlLoaderOptions()
                    );

也许有一种正确/干净的方法可以强制 simplexml API 加载此类文件?

Maybe there is a proper/clean way to force simplexml API to load such files ?

编辑:

我错误地认为在 cleanup_xml hack 之后所有问题都消失了.好像数据行XML文件也有问题,可能和上面一样...

I was wrong thinking all problems were gone after the cleanup_xml hack. Seems that also the data rows XML file has problems, probably the same as above...

编辑:

确实,我将 cleanup_xml() 移动到 XmlScanner::scan 中,以应用于每个加载的 XML,现在似乎可以工作了...

Indeed, I moved cleanup_xml() into XmlScanner::scan, to apply to every loaded XML, and now seems to work...

编辑:

似乎命名空间声明是正确的,至少来自 这个简单示例...

Seems the namespace declaration is correct, at least, from this simple example...

那么,我想知道为什么 simplexml_load_string 不接受格式:

Then, I wonder why simplexml_load_string doesn't accept the format:

<x:workbook ... xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
....
</x:workbook>

虽然它显然接受

<workbook ... xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
....
<workbook>

编辑

深入研究 simplexml API,这个答案有助于理解问题.现在我可以尝试重写我对命名空间的 hackish cleanup_xml 帐户......只是想知道 PhpSpreadsheet 是否提供了更好的方法......似乎很奇怪这个问题以前没有被注意到......

Have digged into simplexml API, this answer helped to understand the problem. Now I can try to rewrite my hackish cleanup_xml accounting for namespaces... Just wondering if PhpSpreadsheet offers a better way... seems strange this problem has been unnoticed before...

编辑

好的,现在我找到了错误报告...

ok, now I've found the bug report...

推荐答案

这似乎是 aPhpSpreadsheet 中的错误.

打开我本周使用 Microsoft Excel 的真实副本workbook.xml"创建的 XLSX 文件;开始是这样的:

Opening an XLSX file I created this week with a real copy of Microsoft Excel, the "workbook.xml" starts like this:

<workbook
 xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" 
 xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
 xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" 
 mc:Ignorable="x15 xr xr6 xr10 xr2"
 xmlns:x15="http://schemas.microsoft.com/office/spreadsheetml/2010/11/main" 
 xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision" 
 xmlns:xr6="http://schemas.microsoft.com/office/spreadsheetml/2016/revision6" 
 xmlns:xr10="http://schemas.microsoft.com/office/spreadsheetml/2016/revision10"
 xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2">

这声明了将在文档中使用的八个不同的命名空间.一个恰好被定义为默认命名空间",其他七个被分配了前缀 - 但所有这些都只是这个特定文件的本地.

This declares eight different namespaces that will be used in the document. One happens to be defined as the "default namespace", and the other seven are assigned prefixes - but all of that is just local to this specific file.

如果我们查看您的 XML 文档,我们可以看到所有相同的名称空间都在使用中,另外还有一个:

If we look at your XML document, we can see all the same namespaces in use, plus an extra one:

<x:workbook
 xmlns:x15ac="http://schemas.microsoft.com/office/spreadsheetml/2010/11/ac"
 xmlns:r="http://schemas.openxmlformats.org/officeDocumen/2006/relationships"
 xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
 xmlns:x15="http://schemas.microsoft.com/office/spreadsheetml/2010/11/main"
 xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision"
 xmlns:xr6="http://schemas.microsoft.com/office/spreadsheetml/2016/revision6"
 xmlns:xr10="http://schemas.microsoft.com/office/spreadsheetml2016/revision10"
 xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2"
 mc:Ignorable="x15 xr xr6 xr10 xr2"
 xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">

唯一的区别是命名空间http://schemas.openxmlformats.org/spreadsheetml/2006/main"已分配前缀x",而不是设置为默认命名空间,但这对其含义没有影响.一个不同的库可能会完全不同地标记命名空间,只是因为它生成 XML 的方式:

The only difference is that the namespace "http://schemas.openxmlformats.org/spreadsheetml/2006/main" has been assigned prefix "x", rather than set as the default namespace, but that makes no difference to its meaning. A different library might label the namespaces completely differently, just because of the way it generates the XML:

<ns0:workbook
 xmlns:ns0="http://schemas.openxmlformats.org/spreadsheetml/2006/main" 
 xmlns:ms1="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
 xmlns:ns2="http://schemas.openxmlformats.org/markup-compatibility/2006" 
 ns2:Ignorable="x15 xr xr6 xr10 xr2"
 xmlns:ns3="http://schemas.microsoft.com/office/spreadsheetml/2010/11/main" 
 xmlns:ns4="http://schemas.microsoft.com/office/spreadsheetml/2014/revision" 
 xmlns:ns5="http://schemas.microsoft.com/office/spreadsheetml/2016/revision6" 
 xmlns:ns6="http://schemas.microsoft.com/office/spreadsheetml/2016/revision10"
 xmlns:ns7="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2">

正如在这个参考答案中所解释的,SimpleXML 的命名空间处理基于使用 ->children() 方法来选择要使用的命名空间.使用它的正确方法是始终指定您想要的命名空间 URI,例如http://schemas.openxmlformats.org/spreadsheetml/2006/main"或http://schemas.microsoft.com/office/spreadsheetml/2016/revision10".

As explained in this reference answer, SimpleXML's namespace handling is based around using the ->children() method to select the namespace you want to work with. The correct way to use this is to always specify the namespace URI you want, e.g. "http://schemas.openxmlformats.org/spreadsheetml/2006/main" or "http://schemas.microsoft.com/office/spreadsheetml/2016/revision10".

但是,由于同一个程序通常会创建具有相同前缀选择的 XML 文档,因此很容易编写依赖于以下内容的错误代码:

However, because the same program generally creates XML documents with the same choice of prefixes, it's easy to write incorrect code which relies on:

  • 一个特定的命名空间是默认的,因此在你第一次调用 ->children()
  • 之前被选中
  • 特定的命名空间被绑定到特定的前缀,因此可以通过查找该前缀来选择

PhpSpreadsheet 的作者似乎都犯了两个错误,这意味着当您尝试加载由不同程序创建的文档时,它没有找到它期望的名称空间即使它们实际上在那里强>.

The author of PhpSpreadsheet appears to have made both mistakes, meaning that when you try to load a document created by a different program, it doesn't find the namespaces it expects even though they're actually there.

这篇关于加载具有 XML 命名空间的 XLSX 电子表格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆