为什么非破坏空间不是java中的空白字符? [英] Why is non-breaking space not a whitespace character in java?

查看:113
本文介绍了为什么非破坏空间不是java中的空白字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在寻找一种从解析的HTML中修剪不间断空间的正确方法时,我首先偶然发现java的spartan定义 String.trim()最不恰当的记录。我想避免明确列出符合修剪条件的字符,所以我假设在Character类上使用Unicode支持的方法可以帮我完成工作。

While searching for a proper way to trim non-breaking space from parsed HTML, I've first stumbled on java's spartan definition of String.trim() which is at least properly documented. I wanted to avoid explicitly listing characters eligible for trimming, so I assumed that using Unicode backed methods on Character class would do the job for me.

那是我发现 Character.isWhitespace(char)明确排除不间断空格:

That's when I discovered that Character.isWhitespace(char) explicitly excludes non-breaking spaces:


它是一个Unicode空格字符( SPACE_SEPARATOR LINE_SEPARATOR PARAGRAPH_SEPARATOR 但不是一个不间断的空间'\ u00A0'' \\\ ''\ u202F')。

为什么?

对应的实施.NET等价物不那么有区别。

推荐答案

Character.isWhitespace(char)是旧的。真的老了。 Java早期的许多事情都遵循C的约定和实现。

Character.isWhitespace(char) is old. Really old. Many things done in the early days of Java followed conventions and implementations from C.

现在,十多年后,这些事情似乎是错误的。考虑它证明了事情的进展,甚至在Java的第一天和.NET的第一天之间。

Now, more than a decade later, these things seem erroneous. Consider it evidence how far things have come, even between the first days of Java and the first days of .NET.

Java努力实现100%向后兼容。因此,即使Java团队认为修复他们的初始错误并在从Character.isWhitespace(char)返回true的字符集中添加不间断空格也是好的,他们不能,因为几乎可以肯定存在软件依赖于当前实现的工作方式。

Java strives to be 100% backward compatible. So even if the Java team thought it would be good to fix their initial mistake and add non-breaking spaces to the set of characters that returns true from Character.isWhitespace(char), they can't, because there almost certainly exists software that relies on the current implementation working exactly the way it does.

这篇关于为什么非破坏空间不是java中的空白字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆