移除“空”字符串 [英] Remove "empty" character from String

查看:120
本文介绍了移除“空”字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的是一个框架,它会不时地返回带有空字符的格式错误的字符串。



foobar例如:$ b $第一个字符不是空格(''),所以System.out.printlin()是一个字符串, )将返回foobar而不是foobar。但是,String的长度是7而不是6.显然这使得大多数String方法(等号,拆分,子串,..)无用。有没有办法从字符串中删除空字符?



我试图建立一个新的字符串,如下所示:

  StringBuilder sb = new StringBuilder(); 
for(final char character:malformedString.toCharArray()){
if(Character.isDefined(character)){
sb.append(character);
}
}
sb.toString();

不幸的是这不行。与以下代码相同:

  StringBuilder sb = new StringBuilder 
for(final Character character:malformedString.toCharArray()){
if(character!= null){
sb.append(character)
}
}
sb.toString();

我也无法检查像这样的空字符:

  if(character ==''){
//
}

显然String有一些问题,但我不能改变我使用的框架或等待他们来修复它(如果它是一个错误在其框架内)。



任何想法?

解决方案

这可能是 NULL字符,由 \ 0 。您可以删除它 String#trim()



要指定确切的代码点, / p>

  for(char c:string.toCharArray()){
System.out.printf(U +%04x ,(int)c);
}

然后你可以找到确切的字符此处






更新



>

任何人都知道,有效字符范围而不是排除UTF8范围的95%?


您可以在regex的帮助下做。请参阅@polygenelubricants的答案和这个回答



另一方面,你也可以只是修复它的根,而不是解决它的问题。要么更新文件以摆脱BOM标记,它是一种传统的方式来区分UTF-8文件与其他现在是毫无价值的,或使用读者,它识别和跳过BOM。另请参阅此问题


I'm using a framwork which returns malformed Strings with "empty" characters from time to time.

"foobar" for example is represented by: [,f,o,o,b,a,r]

The first character is NOT a whitespace (' '), so a System.out.printlin() would return "foobar" and not " foobar". Yet, the length of the String is 7 instead of 6. Obviously this makes most String methods (equals, split, substring,..) useless. Is there a way to remove empty characters from a String?

I tried to build a new String like this:

StringBuilder sb = new StringBuilder();
for (final char character : malformedString.toCharArray()) {
  if (Character.isDefined(character)) {
    sb.append(character);
  }
}
sb.toString();

Unfortunately this doesn't work. Same with the following code:

StringBuilder sb = new StringBuilder();
for (final Character character : malformedString.toCharArray()) {
  if (character != null) {
    sb.append(character);
  }
}
sb.toString();

I also can't check for an empty character like this:

   if (character == ''){
     //
   }

Obviously there is something wrong with the String .. but I can't change the framework I'm using or wait for them to fix it (if it is a bug within their framework). I need to handle this String and sanatize it.

Any ideas?

解决方案

It's probably the NULL character which is represented by \0. You can get rid of it by String#trim().

To nail down the exact codepoint, do so:

for (char c : string.toCharArray()) {
    System.out.printf("U+%04x ", (int) c);
}

Then you can find the exact character here.


Update: as per the update:

Anyone know of a way to just include a range of valid characters instead of excluding 95% of the UTF8 range?

You can do that with help of regex. See the answer of @polygenelubricants here and this answer.

On the other hand, you can also just fix the problem in its root instead of workarounding it. Either update the files to get rid of the BOM mark, it's a legacy way to distinguish UTF-8 files from others which is nowadays worthless, or use a Reader which recognizes and skips the BOM. Also see this question.

这篇关于移除“空”字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆