按索引修改字符串中的字符 [英] Modifying chars in a String by index

查看:44
本文介绍了按索引修改字符串中的字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了一个函数来将 titlecase(第一个字母大写,所有其他字母小写)一个借用的字符串,但它最终变得比它应该的更麻烦.

I wrote a function to titlecase (first letter capitalized, all others lowercase) a borrowed String, but it ended up being more of a hassle than it feels like it should be.

fn titlecase_word(word: &mut String) {

    unsafe {
        let buffer = word.as_mut_vec().as_mut_slice();
        buffer[0] = std::char::to_uppercase(buffer[0] as char) as u8;

        for i in range(1, buffer.len()) {
            buffer[i] = std::char::to_lowercase(buffer[i] as char) as u8;
        }
    }
}

unsafe 块是特别不可取的.有没有更好的方法通过索引修改字符串内容?

The unsafe block is particularly undesirable. Is there a nicer way to modify String contents by index?

推荐答案

更新:针对最新的 Rust 进行更新.从 Rust 1.0.0-alpha 开始,to_lowercase()/to_uppercase() 现在是 CharExt trait 并且不再有单独的 Ascii 类型:ASCII 操作现在聚集在两个特征中,AsciiExtOwnedAsciiExt.它们被标记为不稳定,因此它们可能会在整个 Rust 测试版期间发生变化.

Update: updated for the latest Rust. As of Rust 1.0.0-alpha, to_lowercase()/to_uppercase() are now methods in CharExt trait and there is no separate Ascii type anymore: ASCII operations are now gathered in two traits, AsciiExt and OwnedAsciiExt. They are marked as unstable, so they probably can change throughout the Rust beta period.

您的代码不正确,因为它访问单个字节以执行基于字符的操作,但在 UTF-8 中,字符不是字节.对于非 ASCII 的任何内容,它都无法正常工作.

Your code is incorrect because it access individual bytes to perform char-based operations, but in UTF-8 characters are not bytes. It won't work correctly for anything which is not ASCII.

事实上,没有办法就地正确执行此操作,因为任何字符转换都可能会更改字符占用的字节数,这将需要重新分配完整的字符串.您应该遍历字符并将它们收集到一个新字符串中:

In fact, there is no way to do this in-place correctly, because any character conversions may change the number of bytes the character occupy, and this would require full string reallocation. You should iterate over characters and collect them to a new string:

fn titlecase_word(word: &mut String) {
    if word.is_empty() { return; }

    let mut result = String::with_capacity(word.len());

    {
        let mut chars = word.chars();
        result.push(chars.next().unwrap().to_uppercase());

        for c in chars {
            result.push(c.to_lowercase());
        }
    }

    *word = result;
}

(试试这里)

因为无论如何你都需要生成一个新的字符串,最好只返回它,而不用替换旧的.在这种情况下,最好将切片传递给函数:

Because you need generate a new string anyway, it is better just to return it, without replacing the old one. In this case it is also better to pass a slice to the function:

fn titlecase_word(word: &str) -> String {
    let mut result = String::with_capacity(word.len());

    if !word.is_empty() {
        let mut chars = word.chars();
        result.push(chars.next().unwrap().to_uppercase());

        for c in chars {
            result.push(c.to_lowercase());
        }
    }

    result
}

(试试这里)

还有 String 有来自 Extend trait,它提供了一种更惯用的方法,而不是 for 循环:

Also String has extend() method from Extend trait which provides a more idiomatic approach as opposed to for loop:

fn titlecase_word(word: &str) -> String {
    let mut result = String::with_capacity(word.len());

    if !word.is_empty() {
        let mut chars = word.chars();
        result.push(chars.next().unwrap().to_uppercase());
        result.extend(chars.map(|c| c.to_lowercase()));
    }

    result
}

(试试这里)

事实上,使用迭代器可以进一步缩短它:

In fact, with iterators it is possible to shorten it even further:

fn titlecase_word(word: &str) -> String {
    word.chars().enumerate()
        .map(|(i, c)| if i == 0 { c.to_uppercase() } else { c.to_lowercase() })
        .collect()
}

(试试这里)

如果您事先知道您正在使用 ASCII,那么您可以使用 std::ascii 模块:

If you know in advance that you're working with ASCII, however, you could use traits provided by std::ascii module:

fn titlecase_word(word: String) -> String {
    use std::ascii::{AsciiExt, OwnedAsciiExt};
    assert!(word.is_ascii());

    let mut result = word.into_bytes().into_ascii_lowercase();
    result[0] = result[0].to_ascii_uppercase();

    String::from_utf8(result).unwrap()
}

(试试这里)

如果输入字符串包含任何非 ASCII 字符,此函数将失败.

This function will fail if the input string contains any non-ASCII character.

这个函数不会分配任何东西,并且会就地修改字符串内容.但是,您不能使用单个 &mut String 参数编写这样的函数,而没有不安全的 没有额外的分配,因为它需要从 &mut 移出,这是不允许的.

This function won't allocate anything and will modify string contents in-place. However, you can't write such function with a single &mut String argument without unsafe and without extra allocations because it would require moving out from &mut, and this is disallowed.

您可以使用 std::mem::swap() 和一个带有空字符串的临时变量,尽管它不需要不安全,但可能需要分配空字符串.我不记得它是否真的需要分配;如果没有,那么你可以写一个这样的函数,虽然代码会有些繁琐.无论如何,&mut-arguments 对于 Rust 来说并不是真正惯用的.

You could use std::mem::swap() and a temporary variable with an empty string, though - it won't require unsafe but it may require an allocation of the empty string. I don't remember if it actually does need an allocation; if not, then you can write such a function, though the code will be somewhat cumbersome. Anyway, &mut-arguments are not really idiomatic for Rust.

这篇关于按索引修改字符串中的字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆