解析 ATOM RSS 提要并移除 html 标签 [英] Parse ATOM rss feed and remove html tags

查看:40
本文介绍了解析 ATOM RSS 提要并移除 html 标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 powershell 开发此代码.我需要能够提取 html 标签.

am developing this code using powershell. I need to be able to extract the html tags.

  Invoke-WebRequest -Uri 'https://psu.box.com/shared/static/jf36ohodxnw7oemghsau1t7qb0w4y708.rss' -  OutFile C:\users\anr2809\Documents\alerts.txt
  [xml]$Content = Get-Content C:\users\anr2809\Documents\alerts.txt -Raw
  $Regex = '(?s)SE1046.*?Description := "(?<Description>.*?)"'

 If ($Content -match $Regex) {
      "Description is '$($Matches['Description'])'"
      # do something here with $Matches['Description']
    }
 Else {
    "No match."
      }
   $Feed = $Content.rss.channel
 ForEach ($msg in $Feed.Item){
     $ParseData = (($msg.description))
    ForEach ($Datum in $ParseData){
     If ($Datum -like "Title"){[int]$Upvote = ($Datum).split(' ') | Select-Object -First 1}#EndIf
     If ($Datum -like "comments"){[int]$Downvote = ($Datum).split(' ') | Select-Object -First 1}    #EndIf
    }#EndForEach
     [PSCustomObject]@{
     'LastUpdated' = [datetime]$msg.pubDate
     'Title' = $msg.title
     'Category' = $msg.category
     'Author' = $msg.author
     'Link' = $msg.link
     'UpVotes' = $Upvote
     'DownVotes' = $Downvote
     'Validations' = $Validation
     'WorkArounds' = $Workaround
     'Comments' = $msg.description.InnerText                   
     'FeedbackID' = $FeedBackID
    }#EndPSCustomObject
   }

这是结果,我想删除html标签.

This is the results, and I would like to remove the html tags.

LastUpdated : 3/30/2020 9:45:52 AM
Title       : Enterprise Network Planned Outage
Category    : 
Author      : 
Link        : link
UpVotes     : 
DownVotes   : 
Validations : 
WorkArounds : 
Comments    : 
                    <p><strong>People and Locations Impacted:</strong><br />All    students, faculty, and staff at all State locations<br /><br />
FeedbackID  : 

推荐答案

您应该能够使用以下脚本.它使用 HTMLFile com 对象.

You should be able to use the following script. It makes use of the HTMLFile com object.

  Invoke-WebRequest -Uri 'https://*.rss' -  OutFile C:\*.rss
  [xml]$Content = Get-Content C:\*.rss -Raw
  $Regex = '(?s)SE1046.*?Description := "(?<Description>.*?)"'

 If ($Content -match $Regex) {
      "Description is '$($Matches['Description'])'"
      # do something here with $Matches['Description']
    }
 Else {
    "No match."
      }
   $Feed = $Content.rss.channel
 ForEach ($msg in $Feed.Item){


     $ParseData = $msg.description
    ForEach ($Datum in $ParseData){
     If ($Datum -like "Title"){[int]$Upvote = ($Datum).split(' ') | Select-Object -First 1}#EndIf
     If ($Datum -like "comments"){[int]$Downvote = ($Datum).split(' ') | Select-Object -First 1}    #EndIf
    }#EndForEach     

    $HTML = New-Object -ComObject "HTMLFile"
    $HTML.IHTMLDocument2_write($ParseData.InnerText)

     [PSCustomObject]@{
     'LastUpdated' = [datetime]$msg.pubDate
     'Title' = $msg.title
     'Category' = $msg.category
     'Author' = $msg.author
     'Link' = $msg.link
     'UpVotes' = $Upvote
     'DownVotes' = $Downvote
     'Validations' = $Validation
     'WorkArounds' = $Workaround
     'Comments' = $HTML.all.tags("p") | % InnerText           
     'FeedbackID' = $FeedBackID
    }#EndPSCustomObject
   }

这篇关于解析 ATOM RSS 提要并移除 html 标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆