如何从字符串中删除HTML编码的字符? [英] How do I remove HTML encoded characters from a string?

查看:56
本文介绍了如何从字符串中删除HTML编码的字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含一些HTML编码字符的字符串,我想删除它们:

 & lt; div& lt;//& lt;/div& lt;/div& lt; div class = \" paragraph_break \& lt;/&</div& gt;& lt; div& gt;从今天开始,我们将启动PoLS.</div& lt; div class = \"paragraph_break \"& gt;;< br/& lt;/div& gt;< div& gt;请使用以下通讯协议:< br/& lt;/div& lt; div& gt; 1.任务分解和分配-Gravity& br& lt;/div& lt; div& lt; div& gt; 2.所有邮件通信-BC消息& lt;/div& gt;& lt; div& gt; 3.关于PoC/峰值的报告:Writeboard< br& gt;</div& gt;< div& gt; 4.与故事无关的任务:BC To-Do</div& lt; div& div;gt.5.所有UI和HTML都会通过BC与您通信.& br/& lt//div& gt;< div& gt; 6.对于文件共享,我们将使用Dropbox.& br/&&/div& gt;& lt; div& gt; 7.使用Skype进行更简单,更通用的描述.但是,如果您需要任何批准,数据以供以后参考等,请使用BC.</div& gt;< div class = \"paragraph_break \"& lt; br/&</div&>< div& gt;您将获得对所有这些门户的必要访问权限.请谨慎使用它们.</div& gt;< div class = \"paragraph_break \"& lt; br/&</div& gt;< div& gt;祝一切顺利!</div& gt;< div class = \"paragraph_break \"& lt; br/& lt;/div& lt; div& gt;谢谢,& lt;/div;÷< div& lt; Saurav& lt; br/&;>//div& gt; 

解决方案

您想要做的事情有很多可行的方法.也许看看您为什么想要这样做会有所帮助.通常,当我想删除编码的HTML时,我想恢复HTML的内容.Ruby有一些使它变得简单的模块.

 需要'cgi'需要"nokogiri"html =÷ div& gt;大家好,</div& gt;& lt; div class = \" paragraph_break \>& lt;/&&&</div& gt;< div& gt;从今天开始,我们将启动PoLS.; br/& lt;/div& lt; div& lt; div& gt;请使用以下通讯协议:< div& gt; 1.任务分解和分配-Gravity& lt//div& lt; div& lt; div& gt; 2.所有邮件通信-BC消息&< br/& lt;/div& gt;< div& gt; 3.关于PoC/峰值的报告:Writeboard< br/& lt;/div& amp;>< div& gt; 4.与故事无关的任务:BC To-Do& lt;/div& lt; div& lt; div& gt; 5.所有的UI和HTML都会通过BC与您通信.& br/& lt;/div& gt;& lt; div& gt; 6.对于文件共享,我们将使用Dropbox.& br/& lt;/div& lt; div& gt; 7.使用Skype进行更简单,更通用的描述.但是,如果您需要任何批准,数据以供以后参考等,请使用BC.</div& gt;< div class = \"paragraph_break \"& lt; br/&</div&>< div& gt;您将获得对所有这些门户的必要访问权限.请谨慎使用它们.</div& gt;< div class = \"paragraph_break \"& lt; br/&</div& gt;< div& gt;祝一切顺利!</div& gt;< div class = \"paragraph_break \"& lt; br/& lt;/div& lt; div& gt;谢谢,& lt;/div;÷< div& lt; Saurav& lt; br/&;>//div& gt;把CGI.unescapeHTML(html) 

输出:

 < div>全部,</div>< div class ="paragraph_break"></></div>< div>从今天开始,我们将启动PoLS.</div>< div class ="paragraph_break"< br/></div>< div>使用以下通信协议:</div;/div>< div> 1.任务分解和分配-Gravity</div< div> 2.所有邮件通信-BC消息</div>< div> 3.关于PoC/峰值的报告:Writeboard 

/div 4.与故事无关的任务:BC To-do</div< div> 5.所有的UI和HTML都会通过BC与您通信.br/</div>< div> 6.对于文件共享,我们将使用Dropbox.< br/></div>< div> 7.使用Skype进行更简单,更通用的描述.但是,如果您需要任何批准,数据供以后参考等,请使用BC.</div>< div class ="paragraph_break"< br/></div>< div>您将获得对所有这些门户的必要访问权.请明智地开始使用它们.</div>< div class ="paragraph_break">< br/></div>< div>祝一切顺利!</div>< div class ="paragraph_break"> br/</div>< div>谢谢,< br/</div>< div> Saurav< br/></div>

如果我想更进一步并删除标签,请检索所有文本:

 放入Nokogiri :: HTML(CGI.unescapeHTML(html)).content 

将输出:

 大家好,从今天开始,我们将启动PoLS.请使用以下通信协议:1.任务分解和分配-重力2.所有邮件通讯-BC邮件3.关于PoC/尖峰的报告:Writeboard4.与故事无关的任务:BC To-Do5.所有的UI和HTML都会通过BC.6与您通信.对于文件共享,我们将使用Dropbox.7.使用Skype进行更简单,更通用的描述.但是,如果您需要任何批准,数据以供以后参考等,请使用BC.在Skype上已创建PoLS对话.您将获得对所有这些门户的必要访问权.请明智地开始使用它们.祝一切顺利!谢谢,Saurav 

当我看到这样的字符串时,通常是在哪个位置.

Ruby的 CGI 使编码和解码HTML容易. Nokogiri 宝石可轻松删除标签.

I have a string which contains some HTML encoded characters and I want to remove them:

"&lt;div&gt;Hi All,&lt;/div&gt;&lt;div class=\"paragraph_break\"&gt;&lt; /&gt;&lt;/div&gt;&lt;div&gt;Starting today we are initiating PoLS.&lt;/div&gt;&lt;div class=\"paragraph_break\"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Please use the following communication protocols:&lt;br /&gt;&lt;/div&gt;&lt;div&gt;1. Task Breakup and allocation - Gravity&lt;br /&gt;&lt;/div&gt;&lt;div&gt;2. All mail communications - BC messages&lt;br /&gt;&lt;/div&gt;&lt;div&gt;3. Reports on PoC / Spikes: Writeboard&lt;br /&gt;&lt;/div&gt;&lt;div&gt;4. Non story related tasks: BC To-Do&lt;br /&gt;&lt;/div&gt;&lt;div&gt;5. All UI and HTML will communicated to you through BC.&lt;br /&gt;&lt;/div&gt;&lt;div&gt;6. For File sharing, we'll be using Dropbox.&lt;br /&gt;&lt;/div&gt;&lt;div&gt;7. Use Skype for lighter and generic desicussions. However, in case you need any approvals, data for later reference, etc, then please use BC. PoLS conversation has been created on skype.&lt;/div&gt;&lt;div class=\"paragraph_break\"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;You'll have been given necessary accesses to all these portals. Please start using them judiciously.&lt;/div&gt;&lt;div class=\"paragraph_break\"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;All the best!&lt;/div&gt;&lt;div class=\"paragraph_break\"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Thanks,&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Saurav&lt;br /&gt;&lt;/div&gt;"

解决方案

What you want to do is doable many ways. Perhaps looking at why you might want to do that will help. Usually when I want to remove encoded HTML, I want to recover the contents of the HTML. Ruby has some modules that make it easy.

require 'cgi'
require 'nokogiri'

html = "&lt;div&gt;Hi All,&lt;/div&gt;&lt;div class=\"paragraph_break\"&gt;&lt; /&gt;&lt;/div&gt;&lt;div&gt;Starting today we are initiating PoLS.&lt;/div&gt;&lt;div class=\"paragraph_break\"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Please use the following communication protocols:&lt;br /&gt;&lt;/div&gt;&lt;div&gt;1. Task Breakup and allocation - Gravity&lt;br /&gt;&lt;/div&gt;&lt;div&gt;2. All mail communications - BC messages&lt;br /&gt;&lt;/div&gt;&lt;div&gt;3. Reports on PoC / Spikes: Writeboard&lt;br /&gt;&lt;/div&gt;&lt;div&gt;4. Non story related tasks: BC To-Do&lt;br /&gt;&lt;/div&gt;&lt;div&gt;5. All UI and HTML will communicated to you through BC.&lt;br /&gt;&lt;/div&gt;&lt;div&gt;6. For File sharing, we'll be using Dropbox.&lt;br /&gt;&lt;/div&gt;&lt;div&gt;7. Use Skype for lighter and generic desicussions. However, in case you need any approvals, data for later reference, etc, then please use BC. PoLS conversation has been created on skype.&lt;/div&gt;&lt;div class=\"paragraph_break\"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;You'll have been given necessary accesses to all these portals. Please start using them judiciously.&lt;/div&gt;&lt;div class=\"paragraph_break\"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;All the best!&lt;/div&gt;&lt;div class=\"paragraph_break\"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Thanks,&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Saurav&lt;br /&gt;&lt;/div&gt;"

puts CGI.unescapeHTML(html)

which outputs:

<div>Hi All,</div><div class="paragraph_break">< /></div><div>Starting today we are initiating PoLS.</div><div class="paragraph_break"><br /></div><div>Please use the following communication protocols:<br /></div><div>1. Task Breakup and allocation - Gravity<br /></div><div>2. All mail communications - BC messages<br /></div><div>3. Reports on PoC / Spikes: Writeboard<br /></div><div>4. Non story related tasks: BC To-Do<br /></div><div>5. All UI and HTML will communicated to you through BC.<br /></div><div>6. For File sharing, we'll be using Dropbox.<br /></div><div>7. Use Skype for lighter and generic desicussions. However, in case you need any approvals, data for later reference, etc, then please use BC. PoLS conversation has been created on skype.</div><div class="paragraph_break"><br /></div><div>You'll have been given necessary accesses to all these portals. Please start using them judiciously.</div><div class="paragraph_break"><br /></div><div>All the best!</div><div class="paragraph_break"><br /></div><div>Thanks,<br /></div><div>Saurav<br /></div>

If I want to take it a step farther and remove the tags, retrieving all the text:

puts Nokogiri::HTML(CGI.unescapeHTML(html)).content

Will output:

Hi All,Starting today we are initiating PoLS.Please use the following communication protocols:1. Task Breakup and allocation - Gravity2. All mail communications - BC messages3. Reports on PoC / Spikes: Writeboard4. Non story related tasks: BC To-Do5. All UI and HTML will communicated to you through BC.6. For File sharing, we'll be using Dropbox.7. Use Skype for lighter and generic desicussions. However, in case you need any approvals, data for later reference, etc, then please use BC. PoLS conversation has been created on skype.You'll have been given necessary accesses to all these portals. Please start using them judiciously.All the best!Thanks,Saurav

Which is where I usually want to get when I see that sort of string.

Ruby's CGI makes encoding and decoding HTML easy. The Nokogiri gem makes it easy to remove the tags.

这篇关于如何从字符串中删除HTML编码的字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆