在 Rust 中获取以字符为单位的字符串长度 [英] Get the String length in characters in Rust
问题描述
基于 Rust book,String::len
方法返回组成字符串的字节数,可能与字符长度不符.
Based on the Rust book, the String::len
method returns the number of bytes composing the string, which may not correspond to the length in characters.
例如,如果我们考虑以下日语字符串,len()
将返回 30,这是字节数而不是字符数,即 10:
For example if we consider the following string in Japanese, len()
would return 30, which is the number of bytes and not the number of characters, which would be 10:
let s = String::from("ラウトは難しいです!");
s.len() // returns 30.
我发现获取字符数的唯一方法是使用以下函数:
The only way I have found to get the number of characters is using the following function:
s.chars().count()
返回 10,并且是正确的字符数.
which returns 10, and is the correct number of characters.
String
上是否有任何返回字符数的方法,除了我上面使用的方法?
Is there any method on String
that returns the characters count, aside from the one I am using above?
推荐答案
String
上是否有任何返回字符数的方法,除了我上面使用的方法?
Is there any method on
String
that returns the characters count, aside from the one I am using above?
没有.使用 s.chars().count()
是正确的.请注意,这是一个 O(N) 操作(因为 UTF-8 很复杂),而获取字节数是一个 O(1) 操作.
No. Using s.chars().count()
is correct. Note that this is an O(N) operation (because UTF-8 is complex) while getting the number of bytes is an O(1) operation.
你可以看到str
上的所有方法 为自己.
You can see all the methods on str
for yourself.
正如评论中所指出的,char
是一个特定的概念:
As pointed out in the comments, a char
is a specific concept:
重要的是要记住 char
代表一个 Unicode 标量值,可能与您对字符"的理解不符.字素簇上的迭代可能正是您真正想要的.
It's important to remember that
char
represents a Unicode Scalar Value, and may not match your idea of what a 'character' is. Iteration over grapheme clusters may be what you actually want.
一个这样的例子是预先组合的字符:
One such example is with precomposed characters:
fn main() {
println!("{}", "é".chars().count()); // 2
println!("{}", "é".chars().count()); // 1
}
您可能更喜欢使用 来自 unicode-segmentation 板条箱的字形
:
You may prefer to use graphemes
from the unicode-segmentation crate instead:
use unicode_segmentation::UnicodeSegmentation; // 1.6.0
fn main() {
println!("{}", "é".graphemes(true).count()); // 1
println!("{}", "é".graphemes(true).count()); // 1
}
这篇关于在 Rust 中获取以字符为单位的字符串长度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!