如何在 Powershell 中两次过滤文本? [英] How can I filter out text twice in Powershell?

查看:35
本文介绍了如何在 Powershell 中两次过滤文本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 Powershell 脚本,它返回的输出与我想要的很接近,但是我需要删除几行和 HTML 样式的标签.我已经有以下代码可以过滤掉:

I have a Powershell script that returned an output that's close to what I want, however there are a few lines and HTML-style tags I need to remove. I already have the following code to filter out:

get-content "atxtfile.txt" | select-string -Pattern '<fields>' -Context 1

但是,如果我尝试将该输出通过管道传输到第二个 "select-string",我将不会得到任何结果.我正在在线查看 REGEX 示例,但我所看到的大部分内容都涉及使用编码循环来实现其目标.我更习惯于 Linux shell,您可以在其中将输出通过管道传输到多个 greps 以过滤文本.有没有办法用 PowerShell 实现同样的事情或类似的事情?这是我按要求处理的文件:

However, if I attempt to pipe that output into a second "select-string", I won't get any results back. I was looking at the REGEX examples online, but most of what I've seen involves the use of coding loops to achieve their objective. I'm more used to the Linux shell where you can pipe output into multiple greps to filter out text. Is there a way to achieve the same thing or something similar with PowerShell? Here's the file I'm working with as requested:

<?xml version="1.0" encoding="UTF-8"?>
<CustomObject xmlns="http://soap.force.com/2006/04/metadata">
<actionOverrides>
    <actionName>Accept</actionName>
    <type>Default</type>
</actionOverrides>
<actionOverrides>
    <actionName>CancelEdit</actionName>
    <type>Default</type>
</actionOverrides>
   <actionOverrides>
    <actionName>Today</actionName>
    <type>Default</type>
</actionOverrides>
<actionOverrides>
    <actionName>View</actionName>
    <type>Default</type>
</actionOverrides>
<compactLayoutAssignment>SYSTEM</compactLayoutAssignment>
<enableFeeds>false</enableFeeds>
<fields>
    <fullName>ActivityDate</fullName>
</fields>
<fields>
    <fullName>ActivityDateTime</fullName>
</fields>
<fields>
    <fullName>Guid</fullName>
</fields>
<fields>
    <fullName>Description</fullName>
</fields>
</CustomObject>

所以,我只想要 描述符之间的文本,到目前为止我有以下内容:

So, I only want the text between the <fullName> descriptor and I have the following so far:

get-content "txtfile.txt" | select-string -Pattern '<fields>' -Context 1

这将为我提供 描述符之间的所有内容,但是我基本上需要没有 XML 标记的 行.

This will give me everything between the <fields> descriptor, however I essentially need the <fullName> line without the XML tags.

推荐答案

最简单的 PSv3+ 解决方案使用 PowerShell 的内置 XML DOM 支持,这使得 XML文档的节点可作为对象层次结构使用点符号访问:

The simplest PSv3+ solution is to use PowerShell's built-in XML DOM support, which makes an XML document's nodes accessible as a hierarchy of objects with dot notation:

PS> ([xml] (Get-Content -Raw txtfile.txt)).CustomObject.fields.fullName
ActivityDate
ActivityDateTime
Guid
Description    

注意即使 .fields 是一个 array - 代表顶级元素 的所有子 元素<CustomObject> - .fullName 直接应用于它并返回子元素的值 跨所有数组元素( 元素)作为数组.

Note how even though .fields is an array - representing all child <fields> elements of top-level element <CustomObject> - .fullName was directly applied to it and returned the values of child elements <fullName> across all array elements (<field> elements) as an array.

这种访问集合上的属性并将其隐式应用于集合的元素的能力,结果收集在数组中>,是一种通用的 PSv3+ 功能,称为成员枚举.

This ability to access a property on a collection and have it implicitly applied to the collection's elements, with the results getting collected in an array, is a generic PSv3+ feature called member enumeration.

作为替代方案,请考虑使用Select-Xml cmdlet(在 PSv2 中也可用),支持 XPath 查询,通常允许更复杂的提取逻辑(虽然这里不是严格需要的);Select-Xml 是围绕 [xml] .NET 类型的 .SelectNodes() 方法.
以下是上述解决方案的等效项:

As an alternative, consider using the Select-Xml cmdlet (available in PSv2 too), which supports XPath queries that generally allow for more complex extraction logic (though not strictly needed here); Select-Xml is a high-level wrapper around the [xml] .NET type's .SelectNodes() method.
The following is the equivalent of the solution above:

$namespaces = @{ ns="http://soap.force.com/2006/04/metadata" }
$xpathQuery = '/ns:CustomObject/ns:fields/ns:fullName'
(Select-Xml -LiteralPath txtfile.txt $xpathQuery -Namespace $namespaces).Node.InnerText

注意:

与点表示法不同,使用 Select-Xml 时必须考虑 XML 命名空间.

Unlike with dot notation, XML namespaces must be considered when using Select-Xml.

鉴于 及其所有后代都在名称空间 xmlns 中,通过 URI http://soap.force.com/2006/标识04/metadata,你必须:

Given that <CustomObject> and all its descendants are in namespace xmlns, identified via URI http://soap.force.com/2006/04/metadata, you must:

  • 在作为 -Namespace 参数传递的 hashtable 中定义此命名空间
    • 警告:默认命名空间xmlns 的特殊之处在于它不能用作哈希表中的键;相反,选择一个任意键名,例如 ns,但一定要使用所选的键名作为节点名前缀(见下一点).
    • define this namespace in a hashtable you pass as the -Namespace argument
      • Caveat: Default namespace xmlns is special in that it cannot be used as the key in the hashtable; instead, choose an arbitrary key name such as ns, but be sure to use that chosen key name as the node-name prefix (see next point).

      这篇关于如何在 Powershell 中两次过滤文本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆