如何删除 HTML 节点之间的空白? [英] How do I remove white space between HTML nodes?

查看:90
本文介绍了如何删除 HTML 节点之间的空白?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从 <p> 标签之间的 HTML 片段中删除空格

Foo Bar

<p>bar bar bar</p><p>bla</p>

如你所见,

之间总是有一个空格.

标签.

问题是在将字符串保存到我的数据库时,空格会创建
标签.stripgsub 之类的方法只删除节点中的空格,导致:

FooBar

<p>barbarbar</p><p>bla</p>

而我想要:

Foo Bar

bar bar bar

bla

我正在使用:

  • Nokogiri 1.5.6
  • Ruby 1.9.3
  • 导轨

更新:

有时<p>标签的子节点会产生同样的问题:

之间的空格

示例代码

注意:代码通常在一行中,我重新格式化了它,否则会难以忍受......

<p><strong>出售公寓</strong></p><ul><li><p>漂亮的公寓!</p><li><p>靠近火车站</p>...<ul><li><p>距购物中心 10 分钟路程 </p><li><p>风景不错</p>...</p>

我该如何去除这些空白?

解决方案

结果是我在使用 gsub 方法时搞砸了,并没有进一步调查将 gsubregex 一起使用的可能性..

简单的解决方案是添加

data = data.gsub(/>\s+</, "><")

它删除了所有不同类型节点之间的空格...... Regex ftw!

解决方案

这就是我编写代码的方式:

需要'nokogiri'doc = Nokogiri::HTML::DocumentFragment.parse(<<EOT)<p>Foo Bar</p><p>bar bar bar</p><p>bla</p>EOTdoc.search('p, ul, li').each { |节点|next_node = node.next_siblingnext_node.remove 如果 next_node &&next_node.text.strip == ''}把 doc.to_html

结果:

Foo Bar

bar bar bar

bla

分解:

doc.search('p')

仅查找文档中的

节点.Nokogiri 从 search 返回一个 NodeSet,如果没有匹配,则返回 nil.代码在 NodeSet 上循环,依次查看每个节点.

next_node = node.next_sibling

获取指向当前

节点之后的下一个节点的指针.

next_node.remove if next_node &&next_node.text.strip == ''

next_node.remove 从 DOM 中移除当前 next_node 如果下一个节点不是 nil 并且剥离时它的文本不是空的,换句话说,如果节点只有空格.

如果应从文档中删除所有文本节点,则还有其他技术可以仅定位文本节点.这是有风险的,因为它最终可能会删除标签之间的所有空白,导致句子和连接词出现连贯,这可能不是您想要的.

I'm trying to remove whitespace from an HTML fragment between <p> tags

<p>Foo Bar</p> <p>bar bar bar</p> <p>bla</p>

as you can see, there always is a blank space between the <p> </p> tags.

The problem is that the blank spaces create <br> tags when saving the string into my database. Methods like strip or gsub only remove the whitespace in the nodes, resulting in:

<p>FooBar</p> <p>barbarbar</p> <p>bla</p>

whereas I'd like to have:

<p>Foo Bar</p><p>bar bar bar</p><p>bla</p>

I'm using:

  • Nokogiri 1.5.6
  • Ruby 1.9.3
  • Rails

UPDATE:

Occasionally there are children nodes of the <p>Tags that generate the same problem: white space between

Sample Code

Note: the Code normally is in one Line, I reformatted it because it would be unbearable otherwise...

<p>
  <p>
    <strong>Selling an Appartment</strong>
  </p>
  <ul>
    <li>
      <p>beautiful apartment!</p>
    </li>
    <li>
      <p>near the train station</p>
    </li>
    .
    .
    .
  </ul>
  <ul>
    <li> 
      <p>10 minutes away from a shopping mall </p>
    </li>
    <li>
      <p>nice view</p>
    </li>
  </ul>
  .
  .
  .
</p>

How would I strip those white spaces aswell?

SOLUTION

It turns out that I messed up using the gsub method and didn't further investigate the possibility of using gsub with regex...

The simple solution was adding

data = data.gsub(/>\s+</, "><")

It deleted whitespace between all different kinds of nodes... Regex ftw!

解决方案

This is how I'd write the code:

require 'nokogiri'

doc = Nokogiri::HTML::DocumentFragment.parse(<<EOT)
<p>Foo Bar</p> <p>bar bar bar</p> <p>bla</p>
EOT

doc.search('p, ul, li').each { |node| 
  next_node = node.next_sibling
  next_node.remove if next_node && next_node.text.strip == ''
}

puts doc.to_html

It results in:

<p>Foo Bar</p><p>bar bar bar</p><p>bla</p>

Breaking it down:

doc.search('p')

looks for only the <p> nodes in the document. Nokogiri returns a NodeSet from search, or a nil if nothing matched. The code loops over the NodeSet, looking at each node in turn.

next_node = node.next_sibling

gets the pointer to the next node following the current <p> node.

next_node.remove if next_node && next_node.text.strip == ''

next_node.remove removes the current next_node from the DOM if the next node isn't nil and its text isn't empty when stripped, in otherwords, if the node has only whitespace.

There are other techniques to locate only the TextNodes if all of them should be stripped from the document. That's risky, because it can end up deleting all blanks between tags, causing run-on sentences and joined words, which probably isn't what you want.

这篇关于如何删除 HTML 节点之间的空白?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆