是否有带有 UTF-16 字符串类型的 Rust 库?(用于编写 Javascript 解释器) [英] Is there a Rust library with an UTF-16 string type? (intended for writing a Javascript interpreter)

查看：56 发布时间：2021/7/13 21:30:07 string rust utf-16

本文介绍了是否有带有 UTF-16 字符串类型的 Rust 库?(用于编写 Javascript 解释器)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

对于大多数程序，最好在内部使用 UTF-8，并且在必要时, 转换为其他编码.但就我而言，我想编写一个 Javascript 解释器，并且只存储 UTF-16 字符串(或 u16 的数组)要简单得多，因为

For most programs, it's better to use UTF-8 internally and, when necessary, convert to other encodings. But in my case, I want to write a Javascript interpreter, and it's much simpler to store only UTF-16 strings (or arrays of u16), because

我需要单独处理 16 位代码单元(这通常是一个坏主意，但 Javascript 需要这样做).这意味着我需要它来实现 Index.

我需要存储不成对的代理，即格式错误的 UTF-16 字符串(因此，ECMAScript 字符串在技术上被定义为 u16 的数组，即通常代表 UTF-16 字符串).有一种编码恰当地命名为 WTF-8 以在 UTF-8 中存储未配对的代理，但我不想使用这样的东西.

I need to store unpaired surrogates, that is, malformed UTF-16 strings (because of this, ECMAScript strings are technically defined as arrays of u16, that usually represent UTF-16 strings). There is an encoding aptly named WTF-8 to store unpaired surrogates in UTF-8, but I don't want to use something like this.

我想要通常拥有/借用的类型(如 String/str 和 CString/CStr) 使用所有或最常用的方法.我不想滚动我自己的字符串类型(如果可以避免的话).

I want to have the usual owned / borrowed types (like String / str and CString / CStr) with all or most usual methods. I don't want to roll my own string type (if I can avoid).

此外，我的字符串始终是不可变的，位于 Rc 后面，并从包含指向所有字符串的弱指针的数据结构引用(实现字符串实习).这可能是相关的:也许将 Rc 作为字符串类型会更好，其中 Utf16Str 是未定义大小的字符串类型(可以定义为 Utf16Str代码>struct Utf16Str([u16])).这将避免在访问字符串时遵循两个指针，但我不知道如何使用未确定大小的类型实例化 Rc.

Also, my strings will always be immutable, behind an Rc and referred from a data structure containing weak pointers to all strings (implementing string interning). This might be relevant: perhaps it would be better to have Rc<Utf16Str> as the string type, where Utf16Str is the unsized string type (which can be defined as just struct Utf16Str([u16])). That would avoid following two pointers when accessing the string, but I don't know how to instantiate an Rc with an unsized type.

鉴于上述要求，仅仅使用 rust-encoding 非常不方便，因为它处理所有非 UTF-8 编码作为 u8 的向量.

Given the above requirements, merely using rust-encoding is very inconvenient, because it treats all non-UTF-8 encodings as vectors of u8.

另外，我不确定使用标准库可能对我有帮助.我查看了 Utf16Units ，它只是一个迭代器，而不是正确的字符串类型.(另外，我知道 OsString 没有帮助 - 我不在 Windows 上，它甚至没有实现 Index)

Also, I'm not sure if using the std library at all might help me here. I looked into Utf16Units and it's just an iterator, not a proper string type. (also, I know OsString doesn't help - I'm not on Windows, and it doesn't even implement Index<usize>)

是否有带有 UTF-16 字符串类型的 Rust 库?(用于编写 Javascript 解释器) [英] Is there a Rust library with an UTF-16 string type? (intended for writing a Javascript interpreter)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

是否有带有 UTF-16 字符串类型的 Rust 库?(用于编写 Javascript 解释器) [英] Is there a Rust library with an UTF-16 string type? (intended for writing a Javascript interpreter)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭