使用正则表达式从Powershell中的网页获取链接 [英] Getting links from webpage in Powershell using regular expression

查看:237
本文介绍了使用正则表达式从Powershell中的网页获取链接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面是我在Powershell中获取网页链接的代码.间歇性地,我得到无法索引到空数组"异常.这段代码有什么问题吗?需要帮助.

Below is my code in powershell to fetch the links in a webpage. Intermittently, I get "Cannot index into null array" exception. Is there anything wrong in this code. Help required.

$Download = $wc.DownloadString($Link) 
$List = $Download -split "<a\s+" | %{ [void]($_ -match "^href=[`'`"]([^`'`">\s]*)"); $matches[1] }

推荐答案

您不需要自己解析任何内容(正如注释中指出的那样,您不能首先使用正则表达式来解析HTML) .使用Invoke-Webrequest来获取页面;它返回的对象的属性之一是页面上所有链接的集合,这些链接已经为您解析了.

You don't need to parse anything yourself (and as was pointed out in the comments, you can't parse HTML with a regex in the first place). Use Invoke-Webrequest to fetch the page; one of the properties of the object it returns is a collection of all the links on the page, already parsed out for you.

示例:

$Link = "https://stackoverflow.com/questions/49418802/getting-links-from-webpage-in-powershell-using-regular-expression";
Invoke-WebRequest -Uri $Link | Select-Object -ExpandProperty links;

或者,如果仅需要URL,则可以更简洁一些:

Or, if you need just the URLs, you can do it a bit more concisely:

$Link = "https://stackoverflow.com/questions/49418802/getting-links-from-webpage-in-powershell-using-regular-expression";
(Invoke-WebRequest -Uri $Link).links.href;

这篇关于使用正则表达式从Powershell中的网页获取链接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆