Powershell 中非常大的 XML 文件 [英] VERY large XML files in Powershell

查看:76
本文介绍了Powershell 中非常大的 XML 文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于非常大的文本文件,我们可以选择使用 StreamReader 和 StreamWriter,然后允许逐行查找/替换.但是,我有一个 XML 文件,我需要在其中使用更多控制进行查找/替换,例如查找/替换特定节点中的值,该特定节点是具有特定属性和值的另一个节点的子节点.因此,尝试逐行解析相当复杂,并且在使用 XML 文档时非常容易处理.但是,我的文件正在推送 500 MB 和 1200 万行,并且仅加载文件需要很长时间.是否有 XML 的 .NET 等价物?或者,我是否仅限于使用本机 PowerShell,从而导致相关的性能损失?

For very large text files we have the option of using StreamReader and StreamWriter, which then allows for doing find/replace on a line by line bases. However, I have an XML file where I need to do find/replace with a little more control, for example find/replace on a value in a particular node that is a child node of another node with a particular attribute and value. So, rather complex to try to parse line by line, and super easy to deal with when using an XML document. However, my file is pushing 500 MB and 12 million lines, and just loading the file takes an excessively long time. Is there a .NET equivalent for XML? Or am I limited to native PowerShell here, with the associated performance hit?

推荐答案

你可能想看看 SAX 和DOM? 了解有关解析 XML 的替代方法的信息.

You might want to look at What is the difference between SAX and DOM? for information on alternative ways of parsing XML.

SAX 可能是您的好方法.

SAX might be a good method for you.

PowerShell 和 .Net 本身没有原生 SAX 解析器,但 XmlReader 类 可能适合您.

MSDN 文档上的示例的外观来看,它似乎没有做任何太疯狂的事情,也没有使用 PowerShell 中乏味/困难的功能.

From the looks of the examples on the MSDN Docs, it doesn't seem to do anything too crazy or use features that are tedious/difficult in PowerShell.

这是他们的 C# 示例:

Here's their example C#:

// Create a validating XmlReader object. The schema 
// provides the necessary type information.
XmlReaderSettings settings = new XmlReaderSettings();
settings.ValidationType = ValidationType.Schema;
settings.Schemas.Add("urn:empl-hire", "hireDate.xsd");
using (XmlReader reader = XmlReader.Create("hireDate.xml", settings)) {

  // Move to the hire-date element.
  reader.MoveToContent();
  reader.ReadToDescendant("hire-date");

  // Return the hire-date as a DateTime object.
  DateTime hireDate = reader.ReadElementContentAsDateTime();
  Console.WriteLine("Six Month Review Date: {0}", hireDate.AddMonths(6));
}

这是一个 PowerShell 端口,我根本不想测试(抱歉):

Here's a PowerShell port that I didn't bother to test at all (sorry):

# Create a validating XmlReader object. The schema 
# provides the necessary type information.

$settings = New-Object System.Xml.XmlReaderSettings
$settings.ValidationType = [System.Xml.ValidationType]::Schema
$settings.Schemas.Add("urn:empl-hire", "hireDate.xsd") 
# see their page for example XML/XSD

try {
    $reader = [System.Xml.XmlReader]::Create("hireDate.xml", $settings)

    # Move to the hire-date element.
    $reader.MoveToContent();
    $reader.ReadToDescendant("hire-date");

    # Return the hire-date as a DateTime object.
    $hireDate = $reader.ReadElementContentAsDateTime()
    "Six Month Review Date: {0}" -f $hireDate.AddMonths(6) | Write-Verbose -Verbose
} finally {
    $reader.Dispose()
}

这篇关于Powershell 中非常大的 XML 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆