使用powershell在网页中查找特定句子 [英] Find specific sentence in a web page using powershell

查看:47
本文介绍了使用powershell在网页中查找特定句子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用 powershell 通过 whois 解析 IP 地址.我的公司过滤端口 43 和 WHOIS 查询,因此我必须在此处使用的解决方法是要求 powershell 使用诸如 https://who 之类的网站.is,读取 http 流并查找与 IP 地址匹配的组织名称.

I need to use powershell to resolve IP addresses via whois. My company filters port 43 and WHOIS queries so the workaround I have to use here is to ask powershell to use a website such as https://who.is, read the http stream and look for the Organisation Name matching the IP address.

到目前为止,我已经成功地将网页读入了 powershell(这里以 yahoo.com 上的 WHOIS 为例),它是 https://who.is/whois-ip/ip-address/206.190.36.45

So far I have managed to get the webpage read into powershell (example here with a WHOIS on yahoo.com) which is https://who.is/whois-ip/ip-address/206.190.36.45

这是我的片段:

$url=Invoke-WebRequest https://who.is/whois-ip/ip-address/206.190.36.45

现在如果我这样做:

$url.gettype()
IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     False    HtmlWebResponseObject                    Microsoft.PowerShell.Commands.WebResponseObject

我看到这个对象有几个属性:

I see this object has several properties:

Name              MemberType Definition
----              ---------- ----------
Equals            Method     bool Equals(System.Object obj)
GetHashCode       Method     int GetHashCode()
GetType           Method     type GetType()
ToString          Method     string ToString()
AllElements       Property   Microsoft.PowerShell.Commands.WebCmdletElementCollection AllElements {get;}
BaseResponse      Property   System.Net.WebResponse BaseResponse {get;set;}
Content           Property   string Content {get;}
Forms             Property   Microsoft.PowerShell.Commands.FormObjectCollection Forms {get;}
Headers           Property   System.Collections.Generic.Dictionary[string,string] Headers {get;}
Images            Property   Microsoft.PowerShell.Commands.WebCmdletElementCollection Images {get;}
InputFields       Property   Microsoft.PowerShell.Commands.WebCmdletElementCollection InputFields {get;}
Links             Property   Microsoft.PowerShell.Commands.WebCmdletElementCollection Links {get;}
ParsedHtml        Property   mshtml.IHTMLDocument2 ParsedHtml {get;}
RawContent        Property   string RawContent {get;}
RawContentLength  Property   long RawContentLength {get;}
RawContentStream  Property   System.IO.MemoryStream RawContentStream {get;}
Scripts           Property   Microsoft.PowerShell.Commands.WebCmdletElementCollection Scripts {get;}
StatusCode        Property   int StatusCode {get;}
StatusDescription Property   string StatusDescription {get;}

但每次我尝试像

$url.ToString() | select-string "OrgName"

Powershell 返回整个 HTML 代码,因为它将文本字符串解释为一个整体.我找到了一种解决方法,将输出转储到文件中,然后通过对象读取文件(因此每一行都是数组的一个元素),但我有数百个 IP 需要检查,因此这不是一直创建文件的最佳选择.

Powershell returns the whole HTML code because it interprets the text string as a whole. I found a workaround dumping the output into a file and then read the file through an object (so every line is an element of an array) but I have hundreds of IPs to check so that's not very optimal to create a file all the time.

我想知道如何阅读网页内容https://who.is/whois-ip/ip-address/206.190.36.45 并获取以下内容:组织名称:雅虎!广播服务公司

I would like to know how I could read the content of the web page https://who.is/whois-ip/ip-address/206.190.36.45 and get the line that says : OrgName: Yahoo! Broadcast Services, Inc.

只有那一行.

非常感谢您的帮助!:)

Thanks very much for your help! :)

推荐答案

很可能有更好的方法来解析这个,但你目前的逻辑是正确的.

There are most likely better ways to parse this but you were on the right track with you current logic.

$web = Invoke-WebRequest https://who.is/whois-ip/ip-address/206.190.36.45
$web.tostring() -split "[`r`n]" | select-string "OrgName"

Select-String 正在返回匹配项,因为它以前是一个长字符串.使用 -split 我们可以分解它以获得您期望的回报.

Select-String was returning the match as it, previously, was one long string. Using -split we can break it up to just get the return you expected.

OrgName:        Yahoo! Broadcast Services, Inc.

之后的一些字符串操作将得到更清晰的答案.同样,许多方法也可以解决这个问题

Some string manipulation after that will get a cleaner answer. Again, many ways to approach this as well

(($web.tostring() -split "[`r`n]" | select-string "OrgName" | Select -First 1) -split ":")[1].Trim()

我使用 Select -First 1 因为 select-string 可以返回多个对象.它只会确保我们在操作字符串时使用 1.该字符串只是在冒号上拆分并修剪以删除留下的空格.

I used Select -First 1 as select-string could return more than one object. It would just ensure we are working with 1 when we manipulate the string. The string is just split on a colon and trimmed to remove the spaces that are left behind.

由于您正在提取 HTML 数据,我们还可以遍历这些属性以获得更具体的结果.这样做的目的是为了得到 1RedOne 的答案

Since you are pulling HTML data we could also walk through those properties to get more specific results. The intention of this was to get 1RedOne answer

$web = Invoke-WebRequest https://who.is/whois-ip/ip-address/206.190.36.45
$data = $web.AllElements | Where{$_.TagName -eq "Pre"} | Select-Object -Expand InnerText
$whois = ($data -split "`r`n`r`n" | select -index 1) -replace ":\s","=" | ConvertFrom-StringData
$whois.OrgName

在此示例中,所有数据都存储在 PRE 标记的文本中.我所做的是将数据拆分为其部分(部分定义为用空行分隔它们.我寻找连续的换行符).第二组数据包含组织名称.将其存储在一个变量中并将 OrgName 作为属性提取:$whois.OrgName.这是$whois 的样子

All that data is stored in the text of the PRE tag in this example. What I do is split up the data into its sections (Sections are defined with blank lines separating them. I look for consecutive newlines). The second group of data contains the org name. Store that in a variable and pull the OrgName as a property: $whois.OrgName. Here is what $whois looks like

Name                           Value                                                                                                                         
----                           -----                                                                                                                         
Updated                        2013-04-02                                                                                                                    
City                           Sunnyvale                                                                                                                     
Address                        701 First Ave                                                                                                                 
OrgName                        Yahoo! Broadcast Services, Inc.                                                                                               
StateProv                      CA                                                                                                                            
Country                        US                                                                                                                            
Ref                            http://whois.arin.net/rest/org/YAHO                                                                                           
PostalCode                     94089                                                                                                                         
RegDate                        1999-11-17                                                                                                                    
OrgId                          YAHO

如果您喜欢处理这些对象,也可以将该哈希表变成自定义对象.

You can also make that hashtable into a custom object if you prefer dealing with those.

[pscustomobject]$whois

Updated    : 2017-01-28
City       : Sunnyvale
Address    : 701 First Ave
OrgName    : Yahoo! Broadcast Services, Inc.
StateProv  : CA
Country    : US
Ref        : https://whois.arin.net/rest/org/YAHO
PostalCode : 94089
RegDate    : 1999-11-17
OrgId      : YAHO

这篇关于使用powershell在网页中查找特定句子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆