如何从JSoup'Document'中删除不间断的空格? [英] How can I remove non-breaking spaces from a JSoup 'Document'?
问题描述
如何删除这些内容:
<td> </td>
或
<td width="7%"> </td>
?我尝试了很多方法,但是这些不间断的空格字符与普通的JSoup表达式或选择器不匹配。
from my JSoup 'Document'? I've tried many methods, but these non-breaking space characters do not match anything with normal JSoup expressions or Selectors.
推荐答案
HTML实体& nbsp;
( Unicode字符NO-BREAK SPACE U + 00A0 )可以用字符 \ u00a0
表示。假设您要删除包含该字符的每个元素作为自己的文本(因此不是您在评论中所说的每个行),那么以下内容应该有效:
The HTML entity
(Unicode character NO-BREAK SPACE U+00A0) can in Java be represented by the character \u00a0
. Assuming that you want to remove every element which contains that character as own text (and thus not every line as you said in a comment), then the following ought to work:
document.select(":containsOwn(\u00a0)").remove();
如果你真的想删除整个行那么你最好的选择是真的要逐行扫描HTML。
If you really mean to remove the entire line then your best bet is really to scan HTML yourself line by line.
这篇关于如何从JSoup'Document'中删除不间断的空格?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!