使用powershell在网页中查找特定句子 [英] Find specific sentence in a web page using powershell
问题描述
我需要使用 powershell 通过 whois 解析 IP 地址.我的公司过滤端口 43 和 WHOIS 查询,因此我必须在此处使用的解决方法是要求 powershell 使用诸如 https://who 之类的网站.is,读取 http 流并查找与 IP 地址匹配的组织名称.
I need to use powershell to resolve IP addresses via whois. My company filters port 43 and WHOIS queries so the workaround I have to use here is to ask powershell to use a website such as https://who.is, read the http stream and look for the Organisation Name matching the IP address.
到目前为止,我已经成功地将网页读入了 powershell(这里以 yahoo.com 上的 WHOIS 为例),它是 https://who.is/whois-ip/ip-address/206.190.36.45
So far I have managed to get the webpage read into powershell (example here with a WHOIS on yahoo.com) which is https://who.is/whois-ip/ip-address/206.190.36.45
这是我的片段:
$url=Invoke-WebRequest https://who.is/whois-ip/ip-address/206.190.36.45
现在如果我这样做:
$url.gettype()
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True False HtmlWebResponseObject Microsoft.PowerShell.Commands.WebResponseObject
我看到这个对象有几个属性:
I see this object has several properties:
Name MemberType Definition
---- ---------- ----------
Equals Method bool Equals(System.Object obj)
GetHashCode Method int GetHashCode()
GetType Method type GetType()
ToString Method string ToString()
AllElements Property Microsoft.PowerShell.Commands.WebCmdletElementCollection AllElements {get;}
BaseResponse Property System.Net.WebResponse BaseResponse {get;set;}
Content Property string Content {get;}
Forms Property Microsoft.PowerShell.Commands.FormObjectCollection Forms {get;}
Headers Property System.Collections.Generic.Dictionary[string,string] Headers {get;}
Images Property Microsoft.PowerShell.Commands.WebCmdletElementCollection Images {get;}
InputFields Property Microsoft.PowerShell.Commands.WebCmdletElementCollection InputFields {get;}
Links Property Microsoft.PowerShell.Commands.WebCmdletElementCollection Links {get;}
ParsedHtml Property mshtml.IHTMLDocument2 ParsedHtml {get;}
RawContent Property string RawContent {get;}
RawContentLength Property long RawContentLength {get;}
RawContentStream Property System.IO.MemoryStream RawContentStream {get;}
Scripts Property Microsoft.PowerShell.Commands.WebCmdletElementCollection Scripts {get;}
StatusCode Property int StatusCode {get;}
StatusDescription Property string StatusDescription {get;}
但每次我尝试像
$url.ToString() | select-string "OrgName"
Powershell 返回整个 HTML 代码,因为它将文本字符串解释为一个整体.我找到了一种解决方法,将输出转储到文件中,然后通过对象读取文件(因此每一行都是数组的一个元素),但我有数百个 IP 需要检查,因此这不是一直创建文件的最佳选择.
Powershell returns the whole HTML code because it interprets the text string as a whole. I found a workaround dumping the output into a file and then read the file through an object (so every line is an element of an array) but I have hundreds of IPs to check so that's not very optimal to create a file all the time.
我想知道如何阅读网页内容https://who.is/whois-ip/ip-address/206.190.36.45 并获取以下内容:组织名称:雅虎!广播服务公司
I would like to know how I could read the content of the web page https://who.is/whois-ip/ip-address/206.190.36.45 and get the line that says : OrgName: Yahoo! Broadcast Services, Inc.
只有那一行.
非常感谢您的帮助!:)
Thanks very much for your help! :)
推荐答案
很可能有更好的方法来解析这个,但你目前的逻辑是正确的.
There are most likely better ways to parse this but you were on the right track with you current logic.
$web = Invoke-WebRequest https://who.is/whois-ip/ip-address/206.190.36.45
$web.tostring() -split "[`r`n]" | select-string "OrgName"
Select-String
正在返回匹配项,因为它以前是一个长字符串.使用 -split
我们可以分解它以获得您期望的回报.
Select-String
was returning the match as it, previously, was one long string. Using -split
we can break it up to just get the return you expected.
OrgName: Yahoo! Broadcast Services, Inc.
之后的一些字符串操作将得到更清晰的答案.同样,许多方法也可以解决这个问题
Some string manipulation after that will get a cleaner answer. Again, many ways to approach this as well
(($web.tostring() -split "[`r`n]" | select-string "OrgName" | Select -First 1) -split ":")[1].Trim()
我使用 Select -First 1
因为 select-string
可以返回多个对象.它只会确保我们在操作字符串时使用 1.该字符串只是在冒号上拆分并修剪以删除留下的空格.
I used Select -First 1
as select-string
could return more than one object. It would just ensure we are working with 1 when we manipulate the string. The string is just split on a colon and trimmed to remove the spaces that are left behind.
由于您正在提取 HTML 数据,我们还可以遍历这些属性以获得更具体的结果.这样做的目的是为了得到 1RedOne 的答案
Since you are pulling HTML data we could also walk through those properties to get more specific results. The intention of this was to get 1RedOne answer
$web = Invoke-WebRequest https://who.is/whois-ip/ip-address/206.190.36.45
$data = $web.AllElements | Where{$_.TagName -eq "Pre"} | Select-Object -Expand InnerText
$whois = ($data -split "`r`n`r`n" | select -index 1) -replace ":\s","=" | ConvertFrom-StringData
$whois.OrgName
在此示例中,所有数据都存储在 PRE
标记的文本中.我所做的是将数据拆分为其部分(部分定义为用空行分隔它们.我寻找连续的换行符).第二组数据包含组织名称.将其存储在一个变量中并将 OrgName
作为属性提取:$whois.OrgName
.这是$whois
的样子
All that data is stored in the text of the PRE
tag in this example. What I do is split up the data into its sections (Sections are defined with blank lines separating them. I look for consecutive newlines). The second group of data contains the org name. Store that in a variable and pull the OrgName
as a property: $whois.OrgName
. Here is what $whois
looks like
Name Value
---- -----
Updated 2013-04-02
City Sunnyvale
Address 701 First Ave
OrgName Yahoo! Broadcast Services, Inc.
StateProv CA
Country US
Ref http://whois.arin.net/rest/org/YAHO
PostalCode 94089
RegDate 1999-11-17
OrgId YAHO
如果您喜欢处理这些对象,也可以将该哈希表变成自定义对象.
You can also make that hashtable into a custom object if you prefer dealing with those.
[pscustomobject]$whois
Updated : 2017-01-28
City : Sunnyvale
Address : 701 First Ave
OrgName : Yahoo! Broadcast Services, Inc.
StateProv : CA
Country : US
Ref : https://whois.arin.net/rest/org/YAHO
PostalCode : 94089
RegDate : 1999-11-17
OrgId : YAHO
这篇关于使用powershell在网页中查找特定句子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!