解析 ATOM RSS 提要并移除 html 标签 [英] Parse ATOM rss feed and remove html tags
本文介绍了解析 ATOM RSS 提要并移除 html 标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在使用 powershell 开发此代码.我需要能够提取 html 标签.
am developing this code using powershell. I need to be able to extract the html tags.
Invoke-WebRequest -Uri 'https://psu.box.com/shared/static/jf36ohodxnw7oemghsau1t7qb0w4y708.rss' - OutFile C:\users\anr2809\Documents\alerts.txt
[xml]$Content = Get-Content C:\users\anr2809\Documents\alerts.txt -Raw
$Regex = '(?s)SE1046.*?Description := "(?<Description>.*?)"'
If ($Content -match $Regex) {
"Description is '$($Matches['Description'])'"
# do something here with $Matches['Description']
}
Else {
"No match."
}
$Feed = $Content.rss.channel
ForEach ($msg in $Feed.Item){
$ParseData = (($msg.description))
ForEach ($Datum in $ParseData){
If ($Datum -like "Title"){[int]$Upvote = ($Datum).split(' ') | Select-Object -First 1}#EndIf
If ($Datum -like "comments"){[int]$Downvote = ($Datum).split(' ') | Select-Object -First 1} #EndIf
}#EndForEach
[PSCustomObject]@{
'LastUpdated' = [datetime]$msg.pubDate
'Title' = $msg.title
'Category' = $msg.category
'Author' = $msg.author
'Link' = $msg.link
'UpVotes' = $Upvote
'DownVotes' = $Downvote
'Validations' = $Validation
'WorkArounds' = $Workaround
'Comments' = $msg.description.InnerText
'FeedbackID' = $FeedBackID
}#EndPSCustomObject
}
这是结果,我想删除html标签.
This is the results, and I would like to remove the html tags.
LastUpdated : 3/30/2020 9:45:52 AM
Title : Enterprise Network Planned Outage
Category :
Author :
Link : link
UpVotes :
DownVotes :
Validations :
WorkArounds :
Comments :
<p><strong>People and Locations Impacted:</strong><br />All students, faculty, and staff at all State locations<br /><br />
FeedbackID :
推荐答案
您应该能够使用以下脚本.它使用 HTMLFile
com 对象.
You should be able to use the following script. It makes use of the HTMLFile
com object.
Invoke-WebRequest -Uri 'https://*.rss' - OutFile C:\*.rss
[xml]$Content = Get-Content C:\*.rss -Raw
$Regex = '(?s)SE1046.*?Description := "(?<Description>.*?)"'
If ($Content -match $Regex) {
"Description is '$($Matches['Description'])'"
# do something here with $Matches['Description']
}
Else {
"No match."
}
$Feed = $Content.rss.channel
ForEach ($msg in $Feed.Item){
$ParseData = $msg.description
ForEach ($Datum in $ParseData){
If ($Datum -like "Title"){[int]$Upvote = ($Datum).split(' ') | Select-Object -First 1}#EndIf
If ($Datum -like "comments"){[int]$Downvote = ($Datum).split(' ') | Select-Object -First 1} #EndIf
}#EndForEach
$HTML = New-Object -ComObject "HTMLFile"
$HTML.IHTMLDocument2_write($ParseData.InnerText)
[PSCustomObject]@{
'LastUpdated' = [datetime]$msg.pubDate
'Title' = $msg.title
'Category' = $msg.category
'Author' = $msg.author
'Link' = $msg.link
'UpVotes' = $Upvote
'DownVotes' = $Downvote
'Validations' = $Validation
'WorkArounds' = $Workaround
'Comments' = $HTML.all.tags("p") | % InnerText
'FeedbackID' = $FeedBackID
}#EndPSCustomObject
}
这篇关于解析 ATOM RSS 提要并移除 html 标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文