在 Rust 中获取以字符为单位的字符串长度 [英] Get the String length in characters in Rust

查看:70
本文介绍了在 Rust 中获取以字符为单位的字符串长度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

基于 Rust bookString::len 方法返回组成字符串的字节数,可能与字符长度不符.

Based on the Rust book, the String::len method returns the number of bytes composing the string, which may not correspond to the length in characters.

例如,如果我们考虑以下日语字符串,len() 将返回 30,这是字节数而不是字符数,即 10:

For example if we consider the following string in Japanese, len() would return 30, which is the number of bytes and not the number of characters, which would be 10:

let s = String::from("ラウトは難しいです!");
s.len() // returns 30.

我发现获取字符数的唯一方法是使用以下函数:

The only way I have found to get the number of characters is using the following function:

s.chars().count()

返回 10,并且是正确的字符数.

which returns 10, and is the correct number of characters.

String 上是否有任何返回字符数的方法,除了我上面使用的方法?

Is there any method on String that returns the characters count, aside from the one I am using above?

推荐答案

String 上是否有任何返回字符数的方法,除了我上面使用的方法?

Is there any method on String that returns the characters count, aside from the one I am using above?

没有.使用 s.chars().count() 是正确的.请注意,这是一个 O(N) 操作(因为 UTF-8 很复杂),而获取字节数是一个 O(1) 操作.

No. Using s.chars().count() is correct. Note that this is an O(N) operation (because UTF-8 is complex) while getting the number of bytes is an O(1) operation.

你可以看到str上的所有方法 为自己.

You can see all the methods on str for yourself.

正如评论中所指出的,char 是一个特定的概念:

As pointed out in the comments, a char is a specific concept:

重要的是要记住 char 代表一个 Unicode 标量值,可能与您对字符"的理解不符.字素簇上的迭代可能正是您真正想要的.

It's important to remember that char represents a Unicode Scalar Value, and may not match your idea of what a 'character' is. Iteration over grapheme clusters may be what you actually want.

一个这样的例子是预先组合的字符:

One such example is with precomposed characters:

fn main() {
    println!("{}", "é".chars().count()); // 2
    println!("{}", "é".chars().count()); // 1
}

您可能更喜欢使用 来自 unicode-segmentation 板条箱的字形:

You may prefer to use graphemes from the unicode-segmentation crate instead:

use unicode_segmentation::UnicodeSegmentation; // 1.6.0

fn main() {
    println!("{}", "é".graphemes(true).count()); // 1
    println!("{}", "é".graphemes(true).count()); // 1
}

这篇关于在 Rust 中获取以字符为单位的字符串长度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆