如何在 Rust 中索引字符串 [英] How to index a String in Rust

查看:183
本文介绍了如何在 Rust 中索引字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在 Rust 中索引一个字符串,但编译器抛出一个错误.我的代码(欧拉项目问题 4,playground):

I am attempting to index a string in Rust, but the compiler throws an error. My code (Project Euler problem 4, playground):

fn is_palindrome(num: u64) -> bool {
    let num_string = num.to_string();
    let num_length = num_string.len();

    for i in 0 .. num_length / 2 {
        if num_string[i] != num_string[(num_length - 1) - i] {
            return false;
        }
    }

    true
}

错误:

error[E0277]: the trait bound `std::string::String: std::ops::Index<usize>` is not satisfied
 --> <anon>:7:12
  |
7 |         if num_string[i] != num_string[(num_length - 1) - i] {
  |            ^^^^^^^^^^^^^
  |
  = note: the type `std::string::String` cannot be indexed by `usize`

String 不能编入索引有什么原因吗?那么我如何访问数据?

Is there a reason why String can not indexed? How can I access the data then?

推荐答案

是的,在 Rust 中无法对字符串进行索引.这样做的原因是 Rust 字符串在内部是用 UTF-8 编码的,所以索引本身的概念会很模糊,人们会误用它:字节索引很快,但几乎总是不正确的(当你的文本包含非 ASCII 符号时),字节索引可能会让你留在一个字符中,如果你需要文本处理,这真的很糟糕),而字符索引不是免费的,因为 UTF-8 是可变长度编码,所以你必须遍历整个字符串才能找到所需的代码点.

Yes, indexing into a string is not available in Rust. The reason for this is that Rust strings are encoded in UTF-8 internally, so the concept of indexing itself would be ambiguous, and people would misuse it: byte indexing is fast, but almost always incorrect (when your text contains non-ASCII symbols, byte indexing may leave you inside a character, which is really bad if you need text processing), while char indexing is not free because UTF-8 is a variable-length encoding, so you have to traverse the entire string to find the required code point.

如果你确定你的字符串只包含 ASCII 字符,你可以在 &str 上使用 as_bytes() 方法,它返回一个字节片,然后索引进入这个切片:

If you are certain that your strings contain ASCII characters only, you can use the as_bytes() method on &str which returns a byte slice, and then index into this slice:

let num_string = num.to_string();

// ...

let b: u8 = num_string.as_bytes()[i];
let c: char = b as char;  // if you need to get the character as a unicode code point

如果确实需要索引代码点,则必须使用 char() 迭代器:

If you do need to index code points, you have to use the char() iterator:

num_string.chars().nth(i).unwrap()

正如我上面所说,这需要遍历整个迭代器,直到第 i 个代码元素.

As I said above, this would require traversing the entire iterator up to the ith code element.

最后,在很多文本处理的情况下,其实需要用到字素簇 而不是代码点或字节.在 unicode-segmentation crate 的帮助下,您也可以索引到字素簇中:>

Finally, in many cases of text processing, it is actually necessary to work with grapheme clusters rather than with code points or bytes. With the help of the unicode-segmentation crate, you can index into grapheme clusters as well:

use unicode_segmentation::UnicodeSegmentation

let string: String = ...;
UnicodeSegmentation::graphemes(&string, true).nth(i).unwrap()

自然,字素簇索引与索引代码点具有相同的遍历整个字符串的要求.

Naturally, grapheme cluster indexing has the same requirement of traversing the entire string as indexing into code points.

这篇关于如何在 Rust 中索引字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆