如何将 Rust 字符串转换为 UTF-16? [英] How to convert Rust strings to UTF-16?
问题描述
编者注:此代码示例来自 Rust 1.0 之前的版本,不是有效的 Rust 1.0 代码,但答案仍包含有价值的信息.
我想将字符串文字传递给 Windows API.许多 Windows 函数使用 UTF-16 作为字符串编码,而 Rust 的原生字符串是 UTF-8.
我知道 Rust 有 utf16_units() 生成 UTF-16 字符迭代器,但我不知道如何使用该函数生成最后一个字符为零的 UTF-16 字符串.
我正在生成这样的 UTF-16 字符串,但我确信有更好的方法来生成它:
extern "system" {pub fn MessageBoxW(hWnd: int, lpText: *const u16, lpCaption: *const u16, uType: uint) ->国际;}酒吧 fn 主(){让 s1 = ['H' 为 u16,'e' 为 u16,'l' 为 u16,'l' 为 u16,'o' 为 u16,0 为 u16,];不安全{MessageBoxW(0, s1.as_ptr(), 0 as *const u16, 0);}}
Rust 1.8+
str::encode_utf16
是 UTF-16 值的稳定迭代器.
您只需要在该迭代器上使用 collect()
来构造 Vec
,然后在该向量上使用 push(0)
:
pub fn main() {让 s = "你好";让 mut v: Vec<u16>= s.encode_utf16().collect();v.push(0);}
Rust 1.0+
str::utf16_units()
/str::encode_utf16
不稳定.另一种方法是切换到 nightly(如果您正在编写程序而不是库,这是一个可行的选择)或使用像 编码:
extern crate 编码;使用 std::slice;使用 encoding::all::UTF_16LE;使用编码::{Encoding, EncoderTrap};fn 主(){让 s = "你好";让 mut v: Vec u8 ;= UTF_16LE.encode(s, EncoderTrap::Strict).unwrap();v.push(0);v.push(0);让 s: &[u16] = unsafe { slice::from_raw_parts(v.as_ptr() as *const _, v.len()/2) };println!("{:?}", s);}
(或者您可以使用 from_raw_parts_mut
如果你想要一个 &mut [u16]
).
然而,在这个特定的例子中,你必须小心字节序,因为 UTF_16LE
编码为你提供了一个字节向量,以小端字节顺序表示 u16
,而from_raw_parts
技巧允许您将字节向量查看"为平台字节顺序中的 u16
切片,这也可能是大端.如果您想要完整的可移植性,使用 byteorder
之类的 crate 可能会有所帮助.
这个关于 Reddit 的讨论也可能有所帮助.>
Editor's note: This code example is from a version of Rust prior to 1.0 and is not valid Rust 1.0 code, but the answers still contain valuable information.
I want to pass a string literal to a Windows API. Many Windows functions use UTF-16 as the string encoding while Rust's native strings are UTF-8.
I know Rust has utf16_units() to produce a UTF-16 character iterator, but I don't know how to use that function to produce a UTF-16 string with zero as last character.
I'm producing the UTF-16 string like this, but I am sure there is a better method to produce it:
extern "system" {
pub fn MessageBoxW(hWnd: int, lpText: *const u16, lpCaption: *const u16, uType: uint) -> int;
}
pub fn main() {
let s1 = [
'H' as u16, 'e' as u16, 'l' as u16, 'l' as u16, 'o' as u16, 0 as u16,
];
unsafe {
MessageBoxW(0, s1.as_ptr(), 0 as *const u16, 0);
}
}
Rust 1.8+
str::encode_utf16
is the stable iterator of UTF-16 values.
You just need to use collect()
on that iterator to construct Vec<u16>
and then push(0)
on that vector:
pub fn main() {
let s = "Hello";
let mut v: Vec<u16> = s.encode_utf16().collect();
v.push(0);
}
Rust 1.0+
str::utf16_units()
/ str::encode_utf16
is unstable. The alternative is to either switch to nightly (a viable option if you're writing a program, not a library) or to use an external crate like encoding:
extern crate encoding;
use std::slice;
use encoding::all::UTF_16LE;
use encoding::{Encoding, EncoderTrap};
fn main() {
let s = "Hello";
let mut v: Vec<u8> = UTF_16LE.encode(s, EncoderTrap::Strict).unwrap();
v.push(0); v.push(0);
let s: &[u16] = unsafe { slice::from_raw_parts(v.as_ptr() as *const _, v.len()/2) };
println!("{:?}", s);
}
(or you can use from_raw_parts_mut
if you want a &mut [u16]
).
However, in this particular example you have to be careful with endianness because UTF_16LE
encoding gives you a vector of bytes representing u16
's in little endian byte order, while the from_raw_parts
trick allows you to "view" the vector of bytes as a slice of u16
's in your platform's byte order, which may as well be big endian. Using a crate like byteorder
may be helpful here if you want complete portability.
This discussion on Reddit may also be helpful.
这篇关于如何将 Rust 字符串转换为 UTF-16?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!