如何将 Rust 字符串转换为 UTF-16? [英] How to convert Rust strings to UTF-16?

查看:65
本文介绍了如何将 Rust 字符串转换为 UTF-16?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

<块引用>

编者注:此代码示例来自 Rust 1.0 之前的版本,不是有效的 Rust 1.0 代码,但答案仍包含有价值的信息.

我想将字符串文字传递给 Windows API.许多 Windows 函数使用 UTF-16 作为字符串编码,而 Rust 的原生字符串是 UTF-8.

我知道 Rust 有 utf16_units() 生成 UTF-16 字符迭代器,但我不知道如何使用该函数生成最后一个字符为零的 UTF-16 字符串.

我正在生成这样的 UTF-16 字符串,但我确信有更好的方法来生成它:

extern "system" {pub fn MessageBoxW(hWnd: int, lpText: *const u16, lpCaption: *const u16, uType: uint) ->国际;}酒吧 fn 主(){让 s1 = ['H' 为 u16,'e' 为 u16,'l' 为 u16,'l' 为 u16,'o' 为 u16,0 为 u16,];不安全{MessageBoxW(0, s1.as_ptr(), 0 as *const u16, 0);}}

解决方案

Rust 1.8+

str::encode_utf16 是 UTF-16 值的稳定迭代器.

您只需要在该迭代器上使用 collect() 来构造 Vec,然后在该向量上使用 push(0):

pub fn main() {让 s = "你好";让 mut v: Vec<u16>= s.encode_utf16().collect();v.push(0);}

Rust 1.0+

str::utf16_units()/str::encode_utf16 不稳定.另一种方法是切换到 nightly(如果您正在编写程序而不是库,这是一个可行的选择)或使用像 编码:

extern crate 编码;使用 std::slice;使用 encoding::all::UTF_16LE;使用编码::{Encoding, EncoderTrap};fn 主(){让 s = "你好";让 mut v: Vec u8 ;= UTF_16LE.encode(s, EncoderTrap::Strict).unwrap();v.push(0);v.push(0);让 s: &[u16] = unsafe { slice::from_raw_parts(v.as_ptr() as *const _, v.len()/2) };println!("{:?}", s);}

(或者您可以使用 from_raw_parts_mut 如果你想要一个 &mut [u16]).

然而,在这个特定的例子中,你必须小心字节序,因为 UTF_16LE 编码为你提供了一个字节向量,以小端字节顺序表示 u16,而from_raw_parts 技巧允许您将字节向量查看"为平台字节顺序中的 u16 切片,这也可能是大端.如果您想要完整的可移植性,使用 byteorder 之类的 crate 可能会有所帮助.

这个关于 Reddit 的讨论也可能有所帮助.>

Editor's note: This code example is from a version of Rust prior to 1.0 and is not valid Rust 1.0 code, but the answers still contain valuable information.

I want to pass a string literal to a Windows API. Many Windows functions use UTF-16 as the string encoding while Rust's native strings are UTF-8.

I know Rust has utf16_units() to produce a UTF-16 character iterator, but I don't know how to use that function to produce a UTF-16 string with zero as last character.

I'm producing the UTF-16 string like this, but I am sure there is a better method to produce it:

extern "system" {
    pub fn MessageBoxW(hWnd: int, lpText: *const u16, lpCaption: *const u16, uType: uint) -> int;
}

pub fn main() {
    let s1 = [
        'H' as u16, 'e' as u16, 'l' as u16, 'l' as u16, 'o' as u16, 0 as u16,
    ];
    unsafe {
        MessageBoxW(0, s1.as_ptr(), 0 as *const u16, 0);
    }
}

解决方案

Rust 1.8+

str::encode_utf16 is the stable iterator of UTF-16 values.

You just need to use collect() on that iterator to construct Vec<u16> and then push(0) on that vector:

pub fn main() {
    let s = "Hello";

    let mut v: Vec<u16> = s.encode_utf16().collect();
    v.push(0);
}

Rust 1.0+

str::utf16_units() / str::encode_utf16 is unstable. The alternative is to either switch to nightly (a viable option if you're writing a program, not a library) or to use an external crate like encoding:

extern crate encoding;

use std::slice;

use encoding::all::UTF_16LE;
use encoding::{Encoding, EncoderTrap};

fn main() {
    let s = "Hello";

    let mut v: Vec<u8> = UTF_16LE.encode(s, EncoderTrap::Strict).unwrap();
    v.push(0); v.push(0);
    let s: &[u16] = unsafe { slice::from_raw_parts(v.as_ptr() as *const _, v.len()/2) };
    println!("{:?}", s);
}

(or you can use from_raw_parts_mut if you want a &mut [u16]).

However, in this particular example you have to be careful with endianness because UTF_16LE encoding gives you a vector of bytes representing u16's in little endian byte order, while the from_raw_parts trick allows you to "view" the vector of bytes as a slice of u16's in your platform's byte order, which may as well be big endian. Using a crate like byteorder may be helpful here if you want complete portability.

This discussion on Reddit may also be helpful.

这篇关于如何将 Rust 字符串转换为 UTF-16?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆