如何使用Jsoup删除硬空间? [英] How to remove hard spaces with Jsoup?

查看:81
本文介绍了如何使用Jsoup删除硬空间?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试删除硬空间(从HTML中的 实体).我无法使用.trim().replace(" ", "")等将其删除!我不明白.

I'm trying to remove hard spaces (from   entities in the HTML). I can't remove it with .trim() or .replace(" ", ""), etc! I don't get it.

我什至在Stackoverflow上发现尝试使用\\u00a0,但两者均无效.

I even found on Stackoverflow to try with \\u00a0 but didn't work neither.

我尝试了此操作(因为text()返回实际的硬空格字符, U + 00A0 ):

I tried this (since text() returns actual hard space characters, U+00A0):

System.out.println( "'"+fields.get(6).text().replace("\\u00a0", "")+"'" ); //'94,00 '
System.out.println( "'"+fields.get(6).text().replace(" ", "")+"'" ); //'94,00 '
System.out.println( "'"+fields.get(6).text().trim()+"'"); //'94,00 '
System.out.println( "'"+fields.get(6).html().replace(" ", "")+"'"); //'94,00' works

但是我不明白为什么我不能用.text()删除空白.

But I can't figure out why I can't remove the white space with .text().

推荐答案

您的第一次尝试几乎是 ,您完全可以将Jsoup映射 映射到U + 00A0.您只是不希望在字符串中使用双反斜杠:

Your first attempt was very nearly it, you're quite right that Jsoup maps   to U+00A0. You just don't want the double backslash in your string:

System.out.println( "'"+fields.get(6).text().replace("\u00a0", "")+"'" ); //'94,00'
// Just one ------------------------------------------^

replace不使用正则表达式,因此您不会尝试将原义的反斜杠传递给正则表达式级别.您只想在字符串中指定字符U + 00A0.

replace doesn't use regular expressions, so you aren't trying to pass a literal backslash through to the regex level. You just want to specify character U+00A0 in the string.

这篇关于如何使用Jsoup删除硬空间?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆