PowerShell 中的多行正则表达式 [英] Multiline Regex in PowerShell

查看:93
本文介绍了PowerShell 中的多行正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个 PowerShell 脚本,它的主要目的是搜索文件夹中的 HTML 文件,找到特定的 HTML 标记,并替换为我告诉它的内容.

I have this PowerShell script that's main purpose is to search through HTML files within a folder, find specific HTML markup, and replace with what I tell it to.

我已经能够完成 3/4 的查找并完美替换.我遇到的问题涉及正则表达式.

I have been able to do 3/4 of my find and replaces perfectly. The one I am having trouble with involves a Regular Expression.

这是我试图让我的正则表达式查找和替换的标记:

This is the markup that I am trying to make my regex find and replace:

<a href="programsactivities_skating.html"><br />
                                           </a>

这是我目前使用的正则表达式,以及我在其中使用的函数:

Here is the regex I have so far, along with the function I am using it in:

automate -school "C:\Users\$env:username\Desktop\schools\$question" -query '(?mis)(?!exclude1|exclude2|exclude3)(<a[^>]*?>(\s|&nbsp;|<br\s?/?>)*</a>)' -replace ''

这里是自动化功能:

function automate($school, $query, $replace) {
    $processFiles = Get-ChildItem -Exclude *.bak -Include "*.html", "*.HTML", "*.htm", "*.HTM" -Recurse -Path $school
    foreach ($file in  $processFiles) {
        $text = Get-Content $file
        $text = $text -replace $query, $replace
        $text | Out-File $file -Force -Encoding utf8
    }
}

我已经尝试找出解决方案大约 2 天了,但似乎无法让它发挥作用.我已经确定这个问题是我需要告诉我的正则表达式来解释 Multiline,这就是我遇到的问题.

I have been trying to figure out the solution to this for about 2 days now, and just can't seem to get it to work. I have determined that problem is that I need to tell my regex to account for Multiline, and that's what I'm having trouble with.

非常感谢任何人可以提供的任何帮助.

Any help anyone can provide is greatly appreciate.

提前致谢.

推荐答案

Get-Content 生成一个字符串数组,其中每个字符串包含输入文件中的一行,因此您不会能够匹配跨越多行的文本段落.如果您希望能够匹配多行,则需要将数组合并为一个字符串:

Get-Content produces an array of strings, where each string contains a single line from your input file, so you won't be able to match text passages spanning more than one line. You need to merge the array into a single string if you want to be able to match more than one line:

$text = Get-Content $file | Out-String

[String]$text = Get-Content $file

$text = [IO.File]::ReadAllText($file)

请注意,1st 和 2nd 方法不会保留输入文件中的换行符.正如 Keith 在评论中指出的那样,方法 2 只是简单地破坏了所有换行符,方法 1 在加入数组时将 <CR><LF> 放在每行的末尾.后者在处理 Linux/Unix 或 Mac 文件时可能是一个问题.

Note that the 1st and 2nd method don't preserve line breaks from the input file. Method 2 simply mangles all line breaks, as Keith pointed out in the comments, and method 1 puts <CR><LF> at the end of each line when joining the array. The latter may be an issue when dealing with Linux/Unix or Mac files.

这篇关于PowerShell 中的多行正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆