使用Nokogiri替换标签 - 更快的方式? [英] Replace tags using Nokogiri - quicker way?

查看:154
本文介绍了使用Nokogiri替换标签 - 更快的方式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在名为 html_data 的变量中有以下HTML,其中我想用< img> < a> 标签,并且img标签的 src 参数变为 href



现有的HTML:

 code><!DOCTYPE html> 
< html>
& lt; head>
< title>学习Nokogiri< / title>
< /头>
< body marginwidth =6>
< div valign =top>
< div class =some_class>
< div class =test>
< img src =apple.pngalt =Appleheight =42width =42>
< div style =white-space:pre-wrap;>< / div>
< / div>
< / div>
< / div>
< / body>
< / html>

这是我的解决方案A:

  nokogiri_html = Nokogiri :: HTML(html_data)
nokogiri_html(img)。
a_tag = Nokogiri :: XML :: Node.new(a,nokogiri_html)
a_tag [href] = tag [src]
tag.add_next_sibling(a_tag)
tag.remove()
}

puts'nokogiri_html is',nokogiri_html

这是我的解决方案B:

  nokogiri_html = Nokogiri :: HTML(html_data)
nokogiri_html(img)。每个{| tag |
tag.name =a;
tag.set_attribute(href,tag [src])
}

puts'nokogiri_html is',nokogiri_html



虽然解决方案A工作正常,但我正在寻找是否有更快/直接的方法来替换使用Nokogiri的标签。对于解决方案B,我的img标签被替换为a标签,但是img标签的属性仍然保留在a标签内。以下是解决方案B的结果:

 <!DOCTYPE html& 
< html>
< body>
< p> [\\\
,\\\
,< / p>
\\\

< title> Learning Nokogiri< / title> ;
\\\
,\\\
,\\\

< div valign ='\top\'>
\\\

< div class ='\some_class\'>
\\\

< div class ='\test\ '>
\\\
,< a src =%5C%22apple.png%5C%22alt ='\Apple\'height ='\42 \\ 'width ='\42 \'href =%5C%22apple.png%5C%22>< / a> \\\

< div style = \white-space:'pre-wrap>< / div>
\\\

< / div>
\\\

< / div>
\\\

< / div>
\\\
,\\\
,]
< / body>
< / html>

有没有办法使用Nokogiri在HTML中更快地替换标签?还有如何删除\\\
得到结果?

解决方案

首先,请将样本数据(HTML)剥离到显示问题所需的最低金额。



这里是做你想要的基础:

  require'nokogiri'
$ b b doc = Nokogiri :: HTML(<< EOT)
<!DOCTYPE html>
< html>
< body>
< img src = apple.pngalt =Appleheight =42width =42>
< / body>
< / html>
EOT

doc.search('img')。each do | img |
src,alt =%w [src alt] .map {| p | img [p]}
img.replace < a href ='#{src}'>#{alt}< / a>)
end

doc.to_html
#=>< ;!DOCTYPE html> \ n< html> \\\
< body> \\\
< a href = \apple.png\> Apple< / a> \\\
< / body& \\\
< / html> \\\


puts doc.to_html
#>> <!DOCTYPE html>
#>> < html>
#>> < body>
#>> < a href =apple.png> Apple< / a>
#>> < / body>
#>> < / html>

这样做可以让Nokogiri干净地替换节点。



没有必要做所有这些rigamarole:

  a_tag = Nokogiri :: XML :: Node.new(a,nokogiri_html)
a_tag [href] = tag [src]
tag.add_next_sibling(a_tag)
tag.remove ()

而是创建一个字符串作为您要使用的标签,并让Nokogiri将字符串到节点并替换旧节点:

  src,alt =%w [src alt] .map {| p | img [p]} 
img.replace(< a href ='#{src}'>#{alt}< / a>)
/ pre>

不必在节点之间删除无关的空格。它可以影响HTML的外观,但浏览器会gobble额外的空格,不显示它。



Nokogiri可以被告知不输出节点间的空格,导致压缩/ fugly输出,但是如何做是一个单独的问题。


I have the following HTML in a variable named html_data where I wish to replace <img> tags with <a> tags and the src parameters of the "img" tags becomes href of the "a" tags.

Existing HTML:

<!DOCTYPE html>
<html>
   <head>
      <title>Learning Nokogiri</title>
   </head>
   <body marginwidth="6">
      <div valign="top">
         <div class="some_class">
            <div class="test">
               <img src="apple.png" alt="Apple" height="42" width="42">
               <div style="white-space: pre-wrap;"></div>
            </div>
         </div>
      </div>
   </body>
</html>

This is my solution A:

nokogiri_html = Nokogiri::HTML(html_data)
nokogiri_html("img").each { |tag|
        a_tag = Nokogiri::XML::Node.new("a", nokogiri_html)
        a_tag["href"] = tag["src"]
        tag.add_next_sibling(a_tag)
        tag.remove()
}

puts 'nokogiri_html is', nokogiri_html

This is my solution B:

nokogiri_html = Nokogiri::HTML(html_data)
nokogiri_html("img").each { |tag|
        tag.name= "a";
        tag.set_attribute("href" , tag["src"])
}

puts 'nokogiri_html is', nokogiri_html

While solution A works fine, I am looking if there is a quicker/direct way to replace the tags using Nokogiri. With solution B, my "img" tag does get replaced with the "a" tag, but the properties of the "img" tag still remains inside the "a" tag. Below is the result of Solution B:

<!DOCTYPE html>
<html>
   <body>
      <p>["\n", "\n", "   </p>
      \n", "      
      <title>Learning Nokogiri</title>
      \n", "   \n", "   \n", "      
      <div valign='\"top\"'>
         \n", "         
         <div class='\"some_class\"'>
            \n", "            
            <div class='\"test\"'>
               \n", "               <a src="%5C%22apple.png%5C%22" alt='\"Apple\"' height='\"42\"' width='\"42\"' href="%5C%22apple.png%5C%22"></a>\n", "               
               <div style='\"white-space:' pre-wrap></div>
               \n", "            
            </div>
            \n", "         
         </div>
         \n", "      
      </div>
      \n", "   \n", ""]
   </body>
</html>

Is there a way to replace the tags faster in HTML using Nokogiri? Also how can remove the "\n"s am getting in the result?

解决方案

First, please strip your sample data (HTML) to the barest amount necessary to demonstrate the problem.

Here's the basics of doing what you want:

require 'nokogiri'

doc = Nokogiri::HTML(<<EOT)
<!DOCTYPE html>
<html>
   <body>
     <img src="apple.png" alt="Apple" height="42" width="42">
   </body>
</html>
EOT

doc.search('img').each do |img|
  src, alt = %w[src alt].map{ |p| img[p] }
  img.replace("<a href='#{ src }'>#{ alt }</a>")
end

doc.to_html
# => "<!DOCTYPE html>\n<html>\n   <body>\n     <a href=\"apple.png\">Apple</a>\n   </body>\n</html>\n"

puts doc.to_html
# >> <!DOCTYPE html>
# >> <html>
# >>    <body>
# >>      <a href="apple.png">Apple</a>
# >>    </body>
# >> </html>

Doing it this way allows Nokogiri to replace nodes cleanly.

It's not necessary to do all this rigamarole:

a_tag = Nokogiri::XML::Node.new("a", nokogiri_html)
a_tag["href"] = tag["src"]
tag.add_next_sibling(a_tag)
tag.remove()

Instead, create a string that is the tag you want to use and let Nokogiri convert the string to a node and replace the old node:

src, alt = %w[src alt].map{ |p| img[p] }
img.replace("<a href='#{ src }'>#{ alt }</a>")

It's not necessary to strip extraneous whitespace between nodes. It can affect the look of the HTML but browsers will gobble that extra whitespace and not display it.

Nokogiri can be told to not output the inter-node whitespace, resulting in a compressed/fugly output, but how to do that is a separate question.

这篇关于使用Nokogiri替换标签 - 更快的方式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆