用nokogiri去掉样式属性 [英] Strip style attributes with nokogiri

查看:78
本文介绍了用nokogiri去掉样式属性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在用nokogiri取消一个html页面,我想删除所有样式属性。

我该如何实现这一目标? (我不使用铁轨,所以我不能使用它的sanitize方法,我不想用sanitize宝石,因为我想黑名单删除不白名单)

  html = open(url)
doc = Nokogiri :: HTML(html.read)
doc.css('。post')。
puts post.to_s
end

=> < p>< span style =font-size:x-large> bla bla< a href =http://torrentfreak.com/netflix-is-killing-bittorrent-in-the-us- 110427 /> STATISTICA< / A>布拉布拉< /跨度>< / p为H.

我希望它是

  => < p>< span> bla bla< a href =http://torrentfreak.com/netflix-is-killing-bittorrent-in-the-us-110427/> statistica< / a>布拉布拉< /跨度>< / p为H. 


解决方案

  require' nokogiri'

html ='< p class =post>< span style =font-size:x-large> bla bla< / span>< / p> '
doc = Nokogiri :: HTML(html)
doc.xpath('// @ style')。remove
puts doc.css('。post')
# => < p class =post>< span> bla bla< / span>< / p>

编辑以显示您可以调用 NodeSet#remove ,而不必使用 .each(& $删除)



请注意,如果您有DocumentFragment而不是Document ,Nokogiri的一个长期存在的错误,在这个错误中,从一个片段中搜索并不像您期望的那样工作。解决方法是使用:

  doc.xpath('@ style | .//@ style')。remove 


I'm scrapling an html page with nokogiri and i want to strip out all style attributes.
How can I achieve this? (i'm not using rails so i can't use it's sanitize method and i don't want to use sanitize gem 'cause i want to blacklist remove not whitelist)

html = open(url)
doc = Nokogiri::HTML(html.read)
doc.css('.post').each do |post|
puts post.to_s
end

=> <p><span style="font-size: x-large">bla bla <a href="http://torrentfreak.com/netflix-is-killing-bittorrent-in-the-us-110427/">statistica</a> blabla</span></p>

I want it to be

=> <p><span>bla bla <a href="http://torrentfreak.com/netflix-is-killing-bittorrent-in-the-us-110427/">statistica</a> blabla</span></p>

解决方案

require 'nokogiri'

html = '<p class="post"><span style="font-size: x-large">bla bla</span></p>'
doc = Nokogiri::HTML(html)
doc.xpath('//@style').remove
puts doc.css('.post')
#=> <p class="post"><span>bla bla</span></p>

Edited to show that you can just call NodeSet#remove instead of having to use .each(&:remove).

Note that if you have a DocumentFragment instead of a Document, Nokogiri has a longstanding bug where searching from a fragment does not work as you would expect. The workaround is to use:

doc.xpath('@style|.//@style').remove

这篇关于用nokogiri去掉样式属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆