将 Vec u16 转换为或 Vec WCHAR ;到 &str [英] Convert a Vec<u16> or Vec<WCHAR> to a &str

查看:52
本文介绍了将 Vec u16 转换为或 Vec WCHAR ;到 &str的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Rust 编程来实现一个小程序,但我对字符串转换有点迷茫.

I'm getting into Rust programming to realize a small program and I'm a little bit lost in string conversions.

在我的程序中,我有一个如下的向量:

In my program, I have a vector as follows:

let mut name: Vec<winnt::WCHAR> = Vec::new(); 

WCHAR 与我的 Windows 机器上的 u16 相同.

WCHAR is the same as a u16 on my Windows machine.

我将 Vec 交给一个 C 函数(作为一个指针),它用数据填充它.然后我需要将向量中包含的字符串转换为 &str.但是,无论我尝试什么,我都无法使此转换正常工作.

I hand over the Vec<u16> to a C function (as a pointer) which fills it with data. I then need to convert the string contained in the vector into a &str. However, no matter, what I try, I can not manage to get this conversion working.

我唯一设法开始工作的是将其转换为 WideString:

The only thing I managed to get working is to convert it to a WideString:

 widestr = unsafe { WideCString::from_ptr_str(name.as_ptr()) };

但这似乎是朝错误方向迈出的一步.

But this seems to be a step into the wrong direction.

假设向量包含有效且以空字符结尾的字符串,将 Vec 转换为 &str 的最佳方法是什么.

What is the best way to convert the Vec<u16> to an &str under the assumption that the vector holds a valid and null-terminated string.

推荐答案

然后我需要将包含在向量中的字符串转换为 &str.但是,无论我尝试什么,我都无法使此转换正常工作.

I then need to convert the string contained in the vector into a &str. However, no matter, what I try, I can not manage to get this conversion working.

没有办法让它成为免费"的转换.

There's no way of making this a "free" conversion.

A &str 是使用 UTF-8 编码的 Unicode 字符串.这是一种面向字节的编码.如果您有 UTF-16(或不同但常见的 UCS-2 编码),则无法将其中一种读取为另一种.这相当于尝试将 JPEG 图像读取为 PDF.两个数据块都可能是一个字符串,但编码很重要.

A &str is a Unicode string encoded with UTF-8. This is a byte-oriented encoding. If you have UTF-16 (or the different but common UCS-2 encoding), there's no way to read one as the other. That's equivalent to trying to read a JPEG image as a PDF. Both chunks of data might be a string, but the encoding is important.

第一个问题是你真的需要这样做吗?".很多时候,您可以从一个函数中获取数据并将其推回另一个函数中,而无需查看它.如果你能做到这一点,那可能是最好的答案.

The first question is "do you really need to do that?". Many times, you can take data from one function and shovel it back into another function, never looking at it. If you can get away with that, that might be be best answer.

如果您确实需要转换它,那么您必须处理可能发生的错误.16 位整数的任意数组可能不是有效的 UTF-16 或 UCS-2.这些编码具有很容易产生无效字符串的边缘情况.空终止是另一个方面 - Unicode 实际上允许嵌入 NUL 字符,因此空终止字符串不能包含所有可能的 Unicode 字符!

If you do need to transform it, then you have to deal with the errors that can occur. An arbitrary array of 16-bit integers may not be valid UTF-16 or UCS-2. These encodings have edge cases that can easily produce invalid strings. Null-termination is another aspect - Unicode actually allows for embedded NUL characters, so a null-terminated string can't hold all possible Unicode characters!

一旦您确保编码有效 1 并计算出输入向量中有多少条目构成字符串,那么您必须解码输入格式并重新编码为输出格式.这很可能需要某种新的分配,因此您很可能会得到一个 String,然后它可以在任何 &str 可以使用的地方使用用过.

Once you've ensured that the encoding is valid 1 and figured out how many entries in the input vector comprise the string, then you have to decode the input format and re-encode to the output format. This is likely to require some kind of new allocation, so you are most likely to end up with a String, which can then be used most anywhere a &str can be used.

有一种内置方法可以将 UTF-16 数据转换为字符串:String::from_utf16.请注意,它返回一个 Result 以允许这些错误情况.还有 String::from_utf16_lossy,用 Unicode 替换字符替换无效的编码部分.

There is a built-in method to convert UTF-16 data to a String: String::from_utf16. Note that it returns a Result to allow for these error cases. There's also String::from_utf16_lossy, which replaces invalid encoded parts with the Unicode replacement character.

let name = [0x68, 0x65, 0x6c, 0x6c, 0x6f]; 

let a = String::from_utf16(&name);
let b = String::from_utf16_lossy(&name);

println!("{:?}", a);
println!("{:?}", b);

如果您从指向 u16WCHAR 的指针开始,您需要先使用 slice::from_raw_parts.如果您有一个以空字符结尾的字符串,您需要自己找到 NUL 并适当地对输入进行切片.

If you are starting from a pointer to a u16 or WCHAR, you will need to convert to a slice first by using slice::from_raw_parts. If you have a null-terminated string, you need to find the NUL yourself and slice the input appropriately.

1:这其实是一个很好的使用类型的方式;&str 保证为 UTF-8 编码,因此无需进一步检查.类似地,WideCString 很可能在构造时执行一次检查,然后在以后使用时可以跳过检查.

1: This is actually a great way of using types; a &str is guaranteed to be UTF-8 encoded, so no further check needs to be made. Similarly, the WideCString is likely to perform a check once upon construction and then can skip the check on later uses.

这篇关于将 Vec u16 转换为或 Vec WCHAR ;到 &amp;str的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆