在Powershell中使用正则表达式来获取电子邮件 [英] Using Regex in Powershell to grab email

查看:884
本文介绍了在Powershell中使用正则表达式来获取电子邮件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经写了一个脚本来抓取HTML文件中的不同字段,并使用结果填充变量。我遇到正确的表达式抓住电子邮件的问题。以下是一些示例代码:

  $ txt ='< p class = FillText>< a name =InternetMail_P3> ;< / a> First.Last@company-name.com< / p>'

$ re ='。*?'+'([\\w - +] + ?:\\ [\\w - +] +)* @(:?[\\w - ] + \\)+ [A-ZA-Z] {2,7- } $'

如果($ txt -match $ re)
{
$ email1 = $ matches [1]
写宿主$ email1
}

我收到以下错误:

 对运算符'-match'的错误参数:解析。*?([\\w  -  +] +(?: \\。[\\ w  -  +] +)* @(?:[\\w-] + \\ 
。)+ [a-zA-Z] {2,7})([\\瓦特 - +] +(?: \\ [\\w - +] +。)* @(?:[\\w - ] + \\)+ [A-ZA- Z] {2,7}) - [xy]的范围是相反的顺序。
在行:7 char:16
+ if($ txt -match<<< re)
+ CategoryInfo:InvalidOperation:(:) [],RuntimeException
+ FullyQualifiedErrorId:BadOperatorArgumen t

我在这里缺少什么?另外,电子邮件是否有更好的正则表达式?



提前感谢

解决方案>

实际上,适用于.Net或C#的任何正则表达式将适用于PowerShell 。而您可以在stackoverflow和inet找到吨和吨的样品。例如:如何查找或验证电子邮件地址:官方标准:RFC 2822

  $ txt ='< p class = FillText>< a name =InternetMail_P3>< / a> First.Last@company-name.com< / p>'
$ re =[a-z0-9!#\ $%&'* + / =?^ _`{|}〜 - ] +(?:\ [A-Z0-9#\ $%&放大器;'* + / = ^ _`{|}〜 - ] +。!?)* @(?:[A-Z0-9 ](在[a-Z0-9 - ] * [A-Z0-9]?)\)+ [A-Z0-9](?:?[A-Z0-9 - ] * [A-Z0 -9])?
[regex] :: MAtch($ txt,$ re,IgnoreCase)

但也有其他的答案。 正则表达式本质不是很适合解析XML / HTML 。您可以在这里找到更多详细信息:使用正则表达式解析HTML:为什么不?



为了提供真正的解决方案,我先要批评


  1. 转换HTML→XHTML

  2. 走过XML树

  3. 可以逐个使用各个节点,即使使用正则表达式。


I have wrote a script to grab different fields in an HTML file and populate variables with the results. I'm having issues with the regular expression for grabbing the email. Here is some sample code:

$txt='<p class=FillText><a name="InternetMail_P3"></a>First.Last@company-name.com</p>'

$re='.*?'+'([\\w-+]+(?:\\.[\\w-+]+)*@(?:[\\w-]+\\.)+[a-zA-Z]{2,7})'

if ($txt -match $re)
{
    $email1=$matches[1]
    write-host "$email1"
}

I get the following error:

Bad argument to operator '-match': parsing ".*?([\\w-+]+(?:\\.[\\w-+]+)*@(?:[\\w-]+\\
.)+[a-zA-Z]{2,7})([\\w-+]+(?:\\.[\\w-+]+)*@(?:[\\w-]+\\.)+[a-zA-Z]{2,7})" - [x-y] range in reverse order..
At line:7 char:16
+ if ($txt -match <<<<  $re)
    + CategoryInfo          : InvalidOperation: (:) [], RuntimeException
    + FullyQualifiedErrorId : BadOperatorArgument

What am I missing here? Also, is there a better regex for email?

Thanks in advance.

解决方案

Actually any regex that is suitable for .Net or C# will work for PowerShell. And you could find tons and tons samples at stackoverflow and inet. For example: How to Find or Validate an Email Address: The Official Standard: RFC 2822

$txt='<p class=FillText><a name="InternetMail_P3"></a>First.Last@company-name.com</p>'
$re="[a-z0-9!#\$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#\$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?"
[regex]::MAtch($txt, $re, "IgnoreCase ")

But there is also other part of this answer. Regex by nature is not very suitable to parse XML/HTML. You could find more details here: Using regular expressions to parse HTML: why not?

To provide real solution, I'm recomment first

  1. convert HTML → XHTML
  2. walk over XML tree
  3. work with individual nodes one by one, even using regex.

这篇关于在Powershell中使用正则表达式来获取电子邮件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆