如何删除 HTML 节点之间的空白? [英] How do I remove white space between HTML nodes?
问题描述
我正在尝试从 <p>
标签之间的 HTML 片段中删除空格
Foo Bar
<p>bar bar bar</p><p>bla</p>
如你所见, 之间总是有一个空格.
标签.
问题是在将字符串保存到我的数据库时,空格会创建
标签.strip
或 gsub
之类的方法只删除节点中的空格,导致:
FooBar
<p>barbarbar</p><p>bla</p>
而我想要:
Foo Bar
bar bar bar
bla
我正在使用:
- Nokogiri 1.5.6
- Ruby 1.9.3
- 导轨
更新:
有时<p>
标签的子节点会产生同样的问题:
示例代码
注意:代码通常在一行中,我重新格式化了它,否则会难以忍受......
<p><strong>出售公寓</strong></p><ul><li><p>漂亮的公寓!</p><li><p>靠近火车站</p>...<ul><li><p>距购物中心 10 分钟路程 </p><li><p>风景不错</p>...</p>
我该如何去除这些空白?
解决方案
结果是我在使用 gsub
方法时搞砸了,并没有进一步调查将 gsub
与 regex
一起使用的可能性..
简单的解决方案是添加
data = data.gsub(/>\s+</, "><")
它删除了所有不同类型节点之间的空格...... Regex ftw!
这就是我编写代码的方式:
需要'nokogiri'doc = Nokogiri::HTML::DocumentFragment.parse(<<EOT)<p>Foo Bar</p><p>bar bar bar</p><p>bla</p>EOTdoc.search('p, ul, li').each { |节点|next_node = node.next_siblingnext_node.remove 如果 next_node &&next_node.text.strip == ''}把 doc.to_html
结果:
Foo Bar
bar bar bar
bla
分解:
doc.search('p')
仅查找文档中的 节点.Nokogiri 从
search
返回一个 NodeSet,如果没有匹配,则返回 nil.代码在 NodeSet 上循环,依次查看每个节点.
next_node = node.next_sibling
获取指向当前节点之后的下一个节点的指针.
next_node.remove if next_node &&next_node.text.strip == ''
next_node.remove
从 DOM 中移除当前 next_node
如果下一个节点不是 nil 并且剥离时它的文本不是空的,换句话说,如果节点只有空格.
如果应从文档中删除所有文本节点,则还有其他技术可以仅定位文本节点.这是有风险的,因为它最终可能会删除标签之间的所有空白,导致句子和连接词出现连贯,这可能不是您想要的.
I'm trying to remove whitespace from an HTML fragment between <p>
tags
<p>Foo Bar</p> <p>bar bar bar</p> <p>bla</p>
as you can see, there always is a blank space between the <p> </p>
tags.
The problem is that the blank spaces create <br>
tags when saving the string into my database.
Methods like strip
or gsub
only remove the whitespace in the nodes, resulting in:
<p>FooBar</p> <p>barbarbar</p> <p>bla</p>
whereas I'd like to have:
<p>Foo Bar</p><p>bar bar bar</p><p>bla</p>
I'm using:
- Nokogiri 1.5.6
- Ruby 1.9.3
- Rails
UPDATE:
Occasionally there are children nodes of the <p>
Tags that generate the same problem: white space between
Sample Code
Note: the Code normally is in one Line, I reformatted it because it would be unbearable otherwise...
<p>
<p>
<strong>Selling an Appartment</strong>
</p>
<ul>
<li>
<p>beautiful apartment!</p>
</li>
<li>
<p>near the train station</p>
</li>
.
.
.
</ul>
<ul>
<li>
<p>10 minutes away from a shopping mall </p>
</li>
<li>
<p>nice view</p>
</li>
</ul>
.
.
.
</p>
How would I strip those white spaces aswell?
SOLUTION
It turns out that I messed up using the gsub
method and didn't further investigate the possibility of using gsub
with regex
...
The simple solution was adding
data = data.gsub(/>\s+</, "><")
It deleted whitespace between all different kinds of nodes... Regex ftw!
This is how I'd write the code:
require 'nokogiri'
doc = Nokogiri::HTML::DocumentFragment.parse(<<EOT)
<p>Foo Bar</p> <p>bar bar bar</p> <p>bla</p>
EOT
doc.search('p, ul, li').each { |node|
next_node = node.next_sibling
next_node.remove if next_node && next_node.text.strip == ''
}
puts doc.to_html
It results in:
<p>Foo Bar</p><p>bar bar bar</p><p>bla</p>
Breaking it down:
doc.search('p')
looks for only the <p>
nodes in the document. Nokogiri returns a NodeSet from search
, or a nil if nothing matched. The code loops over the NodeSet, looking at each node in turn.
next_node = node.next_sibling
gets the pointer to the next node following the current <p>
node.
next_node.remove if next_node && next_node.text.strip == ''
next_node.remove
removes the current next_node
from the DOM if the next node isn't nil and its text isn't empty when stripped, in otherwords, if the node has only whitespace.
There are other techniques to locate only the TextNodes if all of them should be stripped from the document. That's risky, because it can end up deleting all blanks between tags, causing run-on sentences and joined words, which probably isn't what you want.
这篇关于如何删除 HTML 节点之间的空白?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!