如何在 Rust 中索引字符串 [英] How to index a String in Rust
问题描述
我试图在 Rust 中索引一个字符串,但编译器抛出一个错误.我的代码(欧拉项目问题 4,playground):
I am attempting to index a string in Rust, but the compiler throws an error. My code (Project Euler problem 4, playground):
fn is_palindrome(num: u64) -> bool {
let num_string = num.to_string();
let num_length = num_string.len();
for i in 0 .. num_length / 2 {
if num_string[i] != num_string[(num_length - 1) - i] {
return false;
}
}
true
}
错误:
error[E0277]: the trait bound `std::string::String: std::ops::Index<usize>` is not satisfied
--> <anon>:7:12
|
7 | if num_string[i] != num_string[(num_length - 1) - i] {
| ^^^^^^^^^^^^^
|
= note: the type `std::string::String` cannot be indexed by `usize`
String
不能编入索引有什么原因吗?那么我如何访问数据?
Is there a reason why String
can not indexed? How can I access the data then?
推荐答案
是的,在 Rust 中无法对字符串进行索引.这样做的原因是 Rust 字符串在内部是用 UTF-8 编码的,所以索引本身的概念会很模糊,人们会误用它:字节索引很快,但几乎总是不正确的(当你的文本包含非 ASCII 符号时),字节索引可能会让你留在一个字符中,如果你需要文本处理,这真的很糟糕),而字符索引不是免费的,因为 UTF-8 是可变长度编码,所以你必须遍历整个字符串才能找到所需的代码点.
Yes, indexing into a string is not available in Rust. The reason for this is that Rust strings are encoded in UTF-8 internally, so the concept of indexing itself would be ambiguous, and people would misuse it: byte indexing is fast, but almost always incorrect (when your text contains non-ASCII symbols, byte indexing may leave you inside a character, which is really bad if you need text processing), while char indexing is not free because UTF-8 is a variable-length encoding, so you have to traverse the entire string to find the required code point.
如果你确定你的字符串只包含 ASCII 字符,你可以在 &str
上使用 as_bytes()
方法,它返回一个字节片,然后索引进入这个切片:
If you are certain that your strings contain ASCII characters only, you can use the as_bytes()
method on &str
which returns a byte slice, and then index into this slice:
let num_string = num.to_string();
// ...
let b: u8 = num_string.as_bytes()[i];
let c: char = b as char; // if you need to get the character as a unicode code point
如果确实需要索引代码点,则必须使用 char()
迭代器:
If you do need to index code points, you have to use the char()
iterator:
num_string.chars().nth(i).unwrap()
正如我上面所说,这需要遍历整个迭代器,直到第 i
个代码元素.
As I said above, this would require traversing the entire iterator up to the i
th code element.
最后,在很多文本处理的情况下,其实需要用到字素簇 而不是代码点或字节.在 unicode-segmentation crate 的帮助下,您也可以索引到字素簇中:>
Finally, in many cases of text processing, it is actually necessary to work with grapheme clusters rather than with code points or bytes. With the help of the unicode-segmentation crate, you can index into grapheme clusters as well:
use unicode_segmentation::UnicodeSegmentation
let string: String = ...;
UnicodeSegmentation::graphemes(&string, true).nth(i).unwrap()
自然,字素簇索引与索引代码点具有相同的遍历整个字符串的要求.
Naturally, grapheme cluster indexing has the same requirement of traversing the entire string as indexing into code points.
这篇关于如何在 Rust 中索引字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!