使用ruby从html文档中删除空格 [英] remove whitespace from html document using ruby

查看:155
本文介绍了使用ruby从html文档中删除空格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我在ruby中有一个字符串,就像

  str =< html> \\\
< head> ; \\\
\\\
< title>我的页面< / title> \\\
\\\
\\\
< / head> \\\
\\\
< body> +
< h1>我的页面< / h1> \\\
\\\
< div id = \pageContent \> \\\
< p>以下是段落+
图表,它可以包含不应删除的空格。\\\
\\\
But\\\
+
应该删除的换行符。< / p>< / body>< / HTML>中

如何删除标签外部的所有空格(空格,制表符和换行符)不在标签内,像< p> 那样仅使用本地Ruby?



解决方案

  str.gsub !(/ \\\
\t /,).gsub!(/> \s *< /,><)

第一个 gsub!用空格替换所有换行符和制表符,第二个删除空格之间的空格。 p>

您的标签内部会有多个空格,但如果您刚删除所有 \\\
\ t ,你会得到类似不被删除。断行的内容,这是不太可读的。另一个正则表达式或前面提到的 .squeeze()可以解决这个问题。


So I have a string in ruby that is something like

str = "<html>\n<head>\n\n  <title>My Page</title>\n\n\n</head>\n\n<body>" +
      "  <h1>My Page</h1>\n\n<div id=\"pageContent\">\n  <p>Here is a para" +
      "graph. It can contain  spaces that should not be removed.\n\nBut\n" +
      "line breaks that should be removed.</p></body></html>"

How would I remove all whitespace (spaces, tabs, and linebreaks) that is outside of a tag/not inside a tag that has content like <p> using only native Ruby?

(I'd like to avoid using XSLT or something for a task this simple.)

解决方案

str.gsub!(/\n\t/, " ").gsub!(/>\s*</, "><")

That first gsub! replaces all line breaks and tabs with spaces, the second removes spaces between tags.

You will end up with multiple spaces inside your tags, but if you just removed all \n and \t, you would get something like "not be removed.Butline breaks", which is not very readable. Another Regular Expression or the aforementioned .squeeze(" ") could take care of that.

这篇关于使用ruby从html文档中删除空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆