如何使用Jsoup删除硬空间? [英] How to remove hard spaces with Jsoup?
问题描述
我正在尝试删除硬空间(从HTML中的
实体).我无法使用.trim()
或.replace(" ", "")
等将其删除!我不明白.
I'm trying to remove hard spaces (from
entities in the HTML). I can't remove it with .trim()
or .replace(" ", "")
, etc! I don't get it.
我什至在Stackoverflow上发现尝试使用\\u00a0
,但两者均无效.
I even found on Stackoverflow to try with \\u00a0
but didn't work neither.
我尝试了此操作(因为text()
返回实际的硬空格字符, U + 00A0 ):
I tried this (since text()
returns actual hard space characters, U+00A0):
System.out.println( "'"+fields.get(6).text().replace("\\u00a0", "")+"'" ); //'94,00 '
System.out.println( "'"+fields.get(6).text().replace(" ", "")+"'" ); //'94,00 '
System.out.println( "'"+fields.get(6).text().trim()+"'"); //'94,00 '
System.out.println( "'"+fields.get(6).html().replace(" ", "")+"'"); //'94,00' works
但是我不明白为什么我不能用.text()
删除空白.
But I can't figure out why I can't remove the white space with .text()
.
推荐答案
您的第一次尝试几乎是 ,您完全可以将Jsoup映射
映射到U + 00A0.您只是不希望在字符串中使用双反斜杠:
Your first attempt was very nearly it, you're quite right that Jsoup maps
to U+00A0. You just don't want the double backslash in your string:
System.out.println( "'"+fields.get(6).text().replace("\u00a0", "")+"'" ); //'94,00'
// Just one ------------------------------------------^
replace
不使用正则表达式,因此您不会尝试将原义的反斜杠传递给正则表达式级别.您只想在字符串中指定字符U + 00A0.
replace
doesn't use regular expressions, so you aren't trying to pass a literal backslash through to the regex level. You just want to specify character U+00A0 in the string.
这篇关于如何使用Jsoup删除硬空间?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!