用于解压缩 xlsx 并从工作表 xml 文件中读取内容的 Powershell 脚本 [英] Powershell script to unzip xlsx and read contents from a sheet xml file

查看:32
本文介绍了用于解压缩 xlsx 并从工作表 xml 文件中读取内容的 Powershell 脚本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

背景

我正在尝试设计一个脚本,该脚本无需安装 Excel 或导入模块/库即可在服务器上运行.这排除了 COM Excel.Application、ImportExcel 模块和其他 3rd 方库.相反,我将 excel 文件解压缩到一组 xml 文件中.我需要在 powershell 中解析这些 xml 文件,以获得跨越多个 Excel 工作表的给定单元格值范围.

I am trying to design a script that I can run on a server without an Excel installation or importing modules/libraries. This rules out COM Excel.Application, the ImportExcel module and other 3rd party libraries. Instead, I unzip the excel file into a collection of xml files. I need to parse these xml files in powershell for a given range of cell values spanning multiple Excel sheets.

到目前为止,我已经编写了一个脚本来检索 sheetID:

So far, I have written a script to retrieve the sheetIDs:

unzip myExcel.xlsx
[xml]$workbookXML = Get-Content xl\workbook.xml
[xml]$sheet = Get-Content xl\worksheets\sheet10.xml

$sheetDictionary = @{}
foreach($sheetChildNode in $workbookXML.workbook.sheets.sheet) {
    $sheetDictionary.add($sheetChildNode.name, $sheetChildNode.sheetId)
}

$sheetDictionary

我可以使用 sheetID 查找 xl\worksheets\sheet.xml 下的各个工作表文件.我的问题是从这些单独的工作表文件中解析和检索值.

I can use the sheetIDs to find the individual sheet files under xl\worksheets\sheet<ID>.xml. My problem is parsing and retrieving values from these individual sheet files.

样本输入

这里是 xl\worksheets\sheet10.xml 的示例:

Here is a sample of xl\worksheets\sheet10.xml:

<?xml version="1.0" encoding="UTF-8" standalone="true"?>

<worksheet xr:uid="{00000000-0001-0000-0800-000000000000}"
xmlns:xr3="http://schemas.microsoft.com/office/spreadsheetml/2016/revision3" 
xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2" 
xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision" 
xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac" mc:Ignorable="x14ac xr xr2 
xr3" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" 
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" 
xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
     <dimension ref="A1:L100"/>
     <sheetViews>
          <sheetView workbookViewId="0">
               <selection sqref="A11:B11" activeCell="A11"/>
          </sheetView>
     </sheetViews>
     <sheetFormatPr x14ac:dyDescent="0.35" defaultRowHeight="14.5"/>
     <cols>
          <col customWidth="1" style="32" width="18.81640625" max="1" min="1"/>
          <col style="32" width="8.7265625" max="2" min="2"/>
          <col customWidth="1" style="5" width="14.81640625" max="11" min="11"/>
          <col customWidth="1" style="5" width="12" max="12" min="12"/>
          </cols>
     <sheetData>
          <row r="6" x14ac:dyDescent="0.35" spans="1:12">
               <c r="A6" t="s" s="33">
                    <v>270</v>
               </c>
               <c r="B6" t="s" s="33">
                    <v>271</v>
               </c>
               <c r="K6" t="s" s="5">
                    <v>272</v>
               </c>
               <c r="L6" t="s" s="5">
                    <v>273</v>
               </c>
          </row>
          <row r="7" x14ac:dyDescent="0.35" spans="1:12">
               <c r="A7" t="str" s="32">
                    <f>'All Parameters'!K13</f>
                    <v>UnwantedValue1</v>
               </c>
               <c r="B7" t="str" s="32">
                    <f>'All Parameters'!L13</f>
                    <v>UnwantedValue2</v>
               </c>
               <c r="K7" t="str" s="5">
                    <f ref="K7:K38" t="shared" si="0">IF(AND(NOT($A7=""),NOT($B7="")),A7,CONCATENATE("ParameterNotUsed",ROW()))</f>
                    <v>db.url</v>
               </c>
               <c r="L7" t="str" s="5">
                    <f ref="L7:L38" t="shared" si="1">IF(AND(NOT($A7=""),NOT($B7="")),B7,CONCATENATE("ParameterNotUsed",ROW()))</f>
                    <v>URLValue</v>
               </c>
          </row>
          <row r="8" x14ac:dyDescent="0.35" spans="1:12">
               <c r="A8" t="str" s="32">
                    <f>'All Parameters'!O14</f>
                    <v>UnwantedValue3</v>
               </c>
               <c r="B8" t="str" s="32">
                    <f>'All Parameters'!P14</f>
                    <v>UnwantedValue4</v>
               </c>
               <c r="K8" t="str" s="5">
                    <f t="shared" si="0"/>
                    <v>db.User</v>
               </c>
               <c r="L8" t="str" s="5">
                    <f t="shared" si="1"/>
                    <v>UserName</v>
               </c>
          </row>
     </sheetData>
<pageMargins footer="0.3" header="0.3" bottom="0.75" top="0.75" right="0.7" left="0.7"/>
</worksheet>

我想从这个 xml 文件中提取 K7、L7(db.url 和 urlValue)和 K8、L8(db.User 和 UserName).位置在 r 节点中给出,值在 v 节点中给出.

I would like to extract K7,L7 (db.url and urlValue) and K8,L8 (db.User and UserName) from this xml file. The location is given in the r node and the value in the v node.

尝试

很遗憾,我无法从工作表 xml 文件中检索任何值.使用这个网站,我试过了

Unfortunately, I am unable to retrieve any values from the sheet xml files. Using this site, I tried

[xml]$sheet = Get-Content xl\worksheets\sheet10.xml

$data = (Select-Xml -xpath "/worksheet/sheetData/row/c[r = '[K-L][7-9]$|[K-L][1-9][0-9]$|[K-L]100']/v" $sheet |
  % {$_.Node.'#text'})
$data

使用 RegEx 覆盖 K7:L100,但没有输出.我也尝试了各种其他方法,例如在 xml 文件中打点,但我无法让它们工作.我愿意接受任何预先安装的 PowerShell 编码方法来检索这些值.

which uses a RegEx to cover K7:L100, but there's no output. I have tried various other methods as well, such as dotting through the xml file, but I could not get them to work. I am open to any pre-installed PowerShell coding approach to retrieve these values.

非常感谢.

推荐答案

两件事 - 您的 xpath 表达式必须考虑到此 xml 中名称空间的存在.另外 - 在 xml 中使用正则表达式从来都不是一个好主意.

Two things - your xpath expression has to take into account the existence of namespaces in this xml. Also - it's never a good idea to use regex with xml.

因此,请尝试以下方式:

So try something along these lines:

$ns = @{ns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"}

$items = Select-Xml -Xml $sheet -XPath '//ns:c[(@r="K7" or @r="L7" or @r="K8" or @r="L8")]//ns:v' -Namespace $ns
$items | Foreach {$_.Node.InnerXml}

输出:

db.url
URLValue
db.User
UserName

要获取 c 属性的属性值,请使用:

To get the attribute values of the c attributes, use:

$items = Select-Xml -Xml $sheet -XPath '//ns:c[@r]/@r' -Namespace $ns
$items | Foreach {$_.Node}

输出:

A6   
B6   
K6   
L6   
A7   
B7   
K7   
L7   
A8   
B8   
K8   
L8   

这篇关于用于解压缩 xlsx 并从工作表 xml 文件中读取内容的 Powershell 脚本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆