RegEx 在 Powershell 中匹配两个字符串之间的字符串 [英] RegEx to match string between two strings in Powershell

查看:65
本文介绍了RegEx 在 Powershell 中匹配两个字符串之间的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的示例数据:

选项failonnomatch on
选项批处理
选项确认关闭
打开 sftp://username:password@host.name.net:22 hostkey="ssh-rsa 102400:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00"

Option failonnomatch on
Option batch on
Option confirm Off
open sftp://username:password@host.name.net:22 hostkey="ssh-rsa 1024 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00"

获取文件*.txt \local\path\Client\File.txt
mv 文件*.txt/remote/archive/

get File*.txt \local\path\Client\File.txt
mv File*.txt /remote/archive/

关闭
退出

我想创建一个 powershell 脚本来从这个文本文件中提取信息片段.

I would like to create a powershell script to extract pieces of information out of this text file.

我需要的物品清单:

  • 用户名
  • 密码
  • 主持人
  • 端口
  • ssh 密钥
  • 文件名
  • 本地路径
  • 远程路径

我希望如果我学会了如何做其中的几个,该方法将适用于所有项目.我尝试使用以下 powershell/regex 提取 ssh 密钥:

I'm hoping that if I learn how to do a couple of these, the method will be applicable to all items. I attempted to extract the ssh key with the following powershell/regex:

$doc -match '(?<=hostkey=")(.*)(?=")' 

$doc 是样本数据

但它似乎返回了整行.任何帮助将不胜感激.谢谢你.

but it appears to be returning the whole line. Any help would be greatly appreciated. Thank you.

推荐答案

如果 -match 返回整行,则含义是-match 操作的 LHS 是一个数组这反过来表明您使用了 Get-Content没有-Raw,它产生的输入是数组在这种情况下-match 充当过滤器.

If -match is returning a whole line, the implication is that the LHS of your -match operation is an array, which in turn suggests that you used Get-Content without -Raw, which yields the input as an array of lines, in which case -match acts as a filter.

相反,使用 Get-Content -Raw 将您的文件作为单行多行字符串读取;使用标量 LHS,
-match然后返回一个[bool]
匹配操作的结果报告在自动变量 $Matches(一个哈希表,其 0 条目包含整体匹配,1 是什么第一个捕获组匹配,...):

Instead, read your file as a single, multi-line string with Get-Content -Raw; with a scalar LHS,
-match then returns a [bool]
, and the results of the matching operation are reported in automatic variable $Matches (a hashtable whose 0 entry contains the overall match, 1 what the 1st capture group matched, ...):

# Read file as a whole, into a single, multi-line string.
$doc = Get-Content -Raw file.txt 

if ($doc -match '(?<=hostkey=")(.*)(?=")') {
   # Output what the 1st capture group captured
   $Matches[1]
}

使用您的样本输入,上述结果
ssh-rsa 1024 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00

With your sample input, the above yields
ssh-rsa 1024 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00

然后您可以扩展该方法以捕获多个令牌,在这种情况下,我建议使用 named 捕获组 ((?...));以下示例使用此类命名捕获组来提取多个感兴趣的标记:

You can then extend the approach to capture multiple tokens, in which case I suggest using named capture groups ((?<name>...)); the following example uses such named capture groups to extract several of the tokens of interest:

if ($doc -match '(?<=sftp://)(?<username>[^:]+):(?<password>[^@]+)@(?<host>[^:]+)'){
  # Output the named capture-group values.
  # Note that index notation (['username']) and property
  # notation (.username) can be used interchangeably.
  $Matches.username
  $Matches.password
  $Matches.host
}

使用您的样本输入,上述结果:

With your sample input, the above yields:

username
password
host.name.net

您可以扩展上述内容以捕获所有感兴趣的令牌.
请注意,. 默认情况下不匹配 \n(换行符)字符.

You can extend the above to capture all tokens of interest.
Note that . by default doesn't match \n (newline) characters.

提取这么多标记可能会导致难以阅读的复杂正则表达式,在这种情况下,x (IgnoreWhiteSpace) 正则表达式选项可以提供帮助(作为内联选项,(?x) 在正则表达式的开头):

Extracting that many tokens can result in a complex regex that is hard to read, in which case the x (IgnoreWhiteSpace) regex option, can help (as an inline option, (?x) at the start of the regex):

if ($doc -match '(?x)
    (?<=sftp://)(?<username>[^:]+)
    :(?<password>[^@]+)
    @(?<host>[^:]+)
    :(?<port>\d+)
    \s+hostkey="(?<sshkey>.+?)"
    \n+get\ File\*\.txt\ (?<localpath>.+)
    \nmv\ File\*\.txt\ (?<remotepath>.+)
  '){
    # Output the named capture-group values.
    $Matches.GetEnumerator() | ? Key -ne 0
}

注意在匹配时如何忽略用于使正则表达式更具可读性(将其扩展到多行)的空格,而输入中要匹配的空格必须转义(例如,要匹配单个空格,\ [ ],或 \s 以匹配任何空白字符.)

Note how the whitespace used for making the regex more readable (spreading it across multiple lines) is ignored while matching, whereas whitespace to be matched in the input must be escaped (e.g., to match a single space, or [ ], or \s to match any whitespace char.)

使用您的示例输入,上面的结果如下:

With your sample input, the above yields the following:

Name                           Value
----                           -----
host                           host.name.net
localpath                      \local\path\Client\File.txt
port                           22
sshkey                         ssh-rsa 1024 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
remotepath                     /remote/archive/
password                       password
username                       username

请注意,捕获组乱序的原因是 $Matches 是一个 哈希表(类型为 [hashtable]),其键枚举顺序是一个实现工件:不保证特定的枚举顺序.

Note that the reason the capture groups are out of order is that $Matches is a hash table (of type [hashtable]), whose key enumeration order is an implementation artifact: no particular enumeration order is guaranteed.

然而,随机访问捕获组工作得很好;例如,$Matches.port 将返回 22.

However, random access to capture groups works just fine; e.g., $Matches.port will return 22.

这篇关于RegEx 在 Powershell 中匹配两个字符串之间的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆