使用流来操作字符串 [英] Using streams to manipulate a String

查看:32
本文介绍了使用流来操作字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我想从 String 中删除所有非字母.

Let's say that I want to remove all the non-letters from my String.

String s = "abc-de3-2fg";

我可以使用 IntStream 来做到这一点:

I can use an IntStream in order to do that:

s.stream().filter(ch -> Character.isLetter(ch)).  // But then what?

我该怎么做才能将此流转换回 String 实例?

What can I do in order to convert this stream back to a String instance?

另一方面,为什么我不能将 String 视为 Character 类型的对象流?

On a different note, why can't I treat a String as a stream of objects of type Character?

String s = "abc-de3-2fg";

// Yields a Stream of char[], therefore doesn't compile
Stream<Character> stream = Stream.of(s.toCharArray());

// Yields a stream with one member - s, which is a String object. Doesn't compile
Stream<Character> stream = Stream.of(s);

根据javadoc,Stream的创建签名如下:

According to the javadoc, the Stream's creation signature is as follows:

Stream.of(T... 值)

Stream.of(T... values)

我能想到的唯一(糟糕的)方法是:

The only (lousy) way that I could think of is:

String s = "abc-de3-2fg";
Stream<Character> stream = Stream.of(s.charAt(0), s.charAt(1), s.charAt(2), ...)

当然,这还不够好……我错过了什么?

And of course, this isn't good enough... What am I missing?

推荐答案

这是问题第二部分的答案.如果你有一个由调用 string.chars() 产生的 IntStream 你可以通过转换为 char 然后通过调用 mapToObj 将结果装箱.例如,下面是如何将 String 转换为 Set:

Here's an answer the second part of the question. If you have an IntStream resulting from calling string.chars() you can get a Stream<Character> by casting to char and then boxing the result by calling mapToObj. For example, here's how to turn a String into a Set<Character>:

Set<Character> set = string.chars()
    .mapToObj(ch -> (char)ch)
    .collect(Collectors.toSet());

请注意,强制转换为 char 对于将装箱结果变为 Character 而不是 Integer 至关重要.

Note that casting to char is essential for the boxed result to be Character instead of Integer.

现在处理 charCharacter 数据的大问题是补充字符表示为 char<的 代理对/code> 值,因此任何处理单个 char 值的算法在出现补充字符时都可能会失败.

Now the big problem with dealing with char or Character data is that supplementary characters are represented as surrogate pairs of char values, so any algorithm with deals with individual char values will probably fail when presented with supplementary characters.

(看起来补充字符是我们无需担心的晦涩难懂的 Unicode 功能,但据我所知,所有表情符号都是补充字符.)

(It may seem like supplementary characters are an obscure Unicode feature that we don't need to worry about, but as far as I know, all emoji are supplementary characters.)

考虑这个例子:

string.chars()
      .filter(Character::isAlphabetic)
      ...

如果出现包含代码点 U+1D400(数学粗体大写字母 A)的字符串,这将失败.该代码点在字符串中表示为代理对,并且代理对的值都不是字母字符.要获得正确的结果,您需要这样做:

This will fail if presented with a string that contains the code point U+1D400 (Mathematical Bold Capital A). That code point is represented as a surrogate pair in the string, and neither value of a surrogate pair is an alphabetic character. To get the correct result, you'd need to do this instead:

string.codePoints()
      .filter(Character::isAlphabetic)
      ...

我建议始终使用 codePoints().

现在,给定一个 IntStream 的代码点,如何将它重新组合成一个字符串?Sleiman Jneidi 的回答 是一个合理的答案 (+1),使用三个参数 collect() IntStream的方法.

Now, given an IntStream of code points, how can one reassemble it into a String? Sleiman Jneidi's answer is a reasonable one (+1), using the three-arg collect() method of IntStream.

这里有一个替代方案:

StringBuilder sb = ... ;
string.codePoints()
      .filter(...)
      .forEachOrdered(sb::appendCodePoint);
return sb.toString();

这可能更灵活一些,如果您已经有一个 StringBuilder 用于累积字符串数据.您不必每次都创建一个新的 StringBuilder,也不必事后将其转换为 String.

This might be a bit more flexible, in cases where you already have a StringBuilder that you're using to accumulate string data. You don't have to create a new StringBuilder each time, nor do you have to convert it to a String afterwards.

这篇关于使用流来操作字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆