是否可以使用 powershell 提取/读取 zip 中文件的一部分? [英] Is it possible to extract / read part of a file within a zip using powershell?

查看:32
本文介绍了是否可以使用 powershell 提取/读取 zip 中文件的一部分?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 powershell 4.0 脚本,它执行各种操作来组织内部网络中的一些大型 zip 文件.这一切正常,但我希望做出一些改进.我想做的一件事是提取 ZIP 文件中 XML 文件中的一些详细信息.

I have a powershell 4.0 script that does various things to organise some large zip files across an internal network. This is all working fine, but I am looking to make a few improvements. One thing that I want to do is extract some details that are within an XML file within the ZIP files.

我通过仅提取运行良好的 XML 对一些小型 ZIP 文件进行了测试.我针对特定文件,因为 zip 可以包含数千个可能非常大的文件.这在我的测试文件上运行良好,但是当我扩展测试时,我意识到这并不是特别理想,因为我正在阅读的 XML 文件本身可能会变得非常大(一个大约 5GB,但它们可能更大).因此,向链中添加文件提取步骤会给流程带来无法接受的延迟,我需要找到替代方案.

I tested this on some small ZIP files by extracting just the XML which worked fine. I target the specific file because the zip can contain thousands of files that can be pretty large. This worked fine on my test files, but when I expanded the testing, I realised this wasn't particularly optimal because the XML files I am reading can get pretty large themselves (one was ~5GB but they could potentially be larger). So adding a file extraction step to the chain creates an unacceptable delay to the process, and I need to find an alternative.

理想情况下,我可以从 ZIP 中的 XML 文件中读取 3-5 个值,而无需提取它.这些值在文件中总是相对较早,所以也许可以只提取文件的前 ~100kb,我可以将提取物视为文本文件并找到所需的值?

Ideally, I would be able read the 3-5 values from the XML file from within the ZIP without extracting it. The values are always relatively early on in the file, so perhaps its possible to just extract the first ~100kb of the file and I could treat the extract as a text file and find the values required?

这是否可能/比仅仅提取整个文件更高效?

Is this possible / more performant than just extracting the entire file?

如果我不能加快速度,我将不得不考虑另一种方式.我对文件内容的控制有限,因此可能会考虑在创建 ZIP 时将这些详细信息拆分为一个较小的单独文件.不过,这将是最后的手段.

If I can't speed things up I'll have to look at another way. I do have limited control over the file content, so could potentially look at splitting out those details into a smaller separate file at ZIP creation. This would be a last resort though.

推荐答案

您应该能够使用 System.IO.Compression.ZipFile 类来做到这一点:

You should be able to do this with the System.IO.Compression.ZipFile class:

# import the containing assembly
Add-Type -AssemblyName System.IO.Compression.FileSystem

try{
  # open the zip file with ZipFile
  $zipFileItem = Get-Item .\Path\To\File.zip
  $zipFile = [System.IO.Compression.ZipFile]::OpenRead($zipFileItem.FullName)

  # find the desired file entry
  $compressedFileEntry = $zipFile.Entries |Where-Object Name -eq MyAwesomeButHugeFile.xml

  # read the first 100kb of the file stream:
  $buffer = [byte[]]::new(100KB)
  $stream = $compressedFileEntry.Open()
  $readLength = $stream.Read($buffer, 0, $buffer.Length)
}
finally{
  # clean up
  if($stream){ $stream.Dispose() }
  if($zipFile){ $zipFile.Dispose() }
}

if($readLength){
  $xmlString = [System.Text.Encoding]::UTF8.GetString($buffer, 0, $readLength)
  # do what you must with `$xmlString` here :)
}
else{
  Write-Warning "Failed to extract partial xml string"
}

这篇关于是否可以使用 powershell 提取/读取 zip 中文件的一部分?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆