如何从JSoup'Document'中删除不间断的空格? [英] How can I remove non-breaking spaces from a JSoup 'Document'?

查看:167
本文介绍了如何从JSoup'Document'中删除不间断的空格?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何删除这些内容:

<td>&nbsp;</td>

<td width="7%">&nbsp;</td>

?我尝试了很多方法,但是这些不间断的空格字符与普通的JSoup表达式或选择器不匹配。

from my JSoup 'Document'? I've tried many methods, but these non-breaking space characters do not match anything with normal JSoup expressions or Selectors.

推荐答案

HTML实体& nbsp; Unicode字符NO-BREAK SPACE U + 00A0 )可以用字符 \ u00a0 表示。假设您要删除包含该字符的每个元素作为自己的文本(因此不是您在评论中所说的每个),那么以下内容应该有效:

The HTML entity &nbsp; (Unicode character NO-BREAK SPACE U+00A0) can in Java be represented by the character \u00a0. Assuming that you want to remove every element which contains that character as own text (and thus not every line as you said in a comment), then the following ought to work:

document.select(":containsOwn(\u00a0)").remove();

如果你真的想删除整个那么你最好的选择是真的要逐行扫描HTML。

If you really mean to remove the entire line then your best bet is really to scan HTML yourself line by line.

这篇关于如何从JSoup'Document'中删除不间断的空格?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆